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1. Introduction 



Since the pioneering work [25) of Wigner in the fifties, random matrices have played a fundamental role in 
modelling complex systems. The basic example is the Wigner matrix ensemble, consisting oi NxN symmetric 
or Hermitian matrices H = (hij) whose matrix entries are identically distributed random variables that are 
independent up to the symmetry constraint H = H* . From a physical point of view, these matrices represent 
Hamilton operators of disordered mean-field quantum systems, where the quantum transition rate from state 
i to state j is given by the entry hij . 

A central problem in the theory or random matrices is to establish the local universality of the spectrum. 
Wigner observed that the distribution of the distances between consecutive eigenvalues (the gap distribution) 
in complex physical systems follows a universal pattern. The Wigner-Dyson-Gaudin-Mehta conjecture. 



formalized in 20 , states that this gap distribution is universal in the sense that it depends only on the 
symmetry class of the matrix, but is otherwise independent of the details of the distribution of the matrix 
entries. This conjecture has recently been established for all symmetry classes in a series of works [4l[lH[T6] 



an alternative approach is given in 24 for the special Hermitian case. The general approach of [4||ll[ 16 to 
prove universality consists of three steps: (i) establish a local semicircle law for the density of eigenvalues; 
(ii) prove universality of Wigner matrices with a small Gaussian component by analysing the convergence 
of Dyson Brownian motion to local equilibrium; (iii) remove the small Gaussian component by comparing 
Green functions of Wigner ensembles with a few matching moments. For an overview of recent results and 



this three-step strategy, see 13 



Wigner's vision was not restricted to Wigner matrices. In fact, he predicted that universality should hold 
for any quantum system, described by a large Hamiltonian H, of sufficient complexity. In order to make such 
complexity mathematically tractable, one typically replaces the detailed structure of H with a statistical 
description. In this phenomenological model, H is drawn from a random ensemble whose distribution mimicks 
the true complexity. One prominent example where random matrix statistics are expected to hold is the 
random Schrodinger operator in the delocalized regime. The random Schrodinger operator differs greatly 
from Wigner matrices in that most of its entries vanish. It describes a model with spatial structure, in contrast 
to the mean-field Wigner matrices where all matrix entries are of comparable size. In order to address the 
question of universality of general disordered quantum systems, and in particular to probe Wigner's vision, 
one therefore has to break the mean-field permutational symmetry of Wigner's original model, and hence 
to allow the distribution of hij to depend on i and j in a nontrivial fashion. For example, if the matrix 
entries are labelled by a discrete torus T C Z'' on the d-dimensional lattice, then the distribution of hij 
may depend on the Euclidean distance |i — i| between sites i and j, thus introducing a nontrivial spatial 
structure into the model. If hij = for |i — j | > 1 we essentially obtain the random Schrodinger operator. 
A random Schrodinger operator models a physical system with a short-range interaction, in contrast to the 
infinite-range, mean-field interaction described by Wigner matrices. More generally, we may consider a band 
matrix, characterized by the property that hij becomes negligible if \i — j\ exceeds a certain parameter, 
W, called the band width, describing the range of the interaction. Hence, by varying the band width W, 
band matrices naturally interpolate between mean-field Wigner matrices and random Schrodinger operators; 
see 23 for an overview. 

For definiteness, let us focus on the case of a one-dimensional band matrix H. A fundamental conjecture, 
supported by nonrigorous supersymmetric arguments as well as numerics [18) , is that the local spectral 
statistics of H are governed by random matrix statistics for large W and by Poisson statistics for small W. 



This transition is in the spirit of the Anderson metal-insulator transition 18 23 , and is conjectured to be 



sharp around the critical value W = vN. In other words, if ^ vA^, we expect the universality results 



of 14 16] to hold. In addition to a transition in the local spectral statistics, an accompanying transition is 
conjectured to occur in the behaviour localization length of the eigenvectors of H, whereby in the large-W^ 
regime they are expected to be completely delocalized and in the small-M^ regime exponentially localized. 
The localization length for band matrices was recently investigated in great detail in [S]. 

Although the Wigner-Dyson-Gaudin-Mehta conjecture was originally stated for Wigner matrices, the 
methods of 4| |llp!6 also apply to certain ensembles with independent but not identically distributed entries, 
which however retain the mean-field character of Wigner matrices. More precisely, they yield universality 
provided the variances 

Sii ■■= E\h,, 
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of the matrix entries are only required to be of comparable size (but not necessarily equal): 



c C 

— < s .„ ^ — (1.1) 

for some positive constants c and C. (Such matrices were called generalized Wigner matrices in [16| .) This 
condition admits a departure from spatial homogeneity, but still imposes a mean-field behaviour and hence 
excludes genuinely inhomogeneous models such as band matrices. 

In the three-step approach to universality outlined above, the first step is to establish the semicircle law 
on very short scales. In the scaling of H where its spectrum is asymptotically given by the interval [—2, 2], the 
typical distance between neighbouring eigenvalues is of order 1 /N . The number of eigenvalues in an interval 
of length rj is typically of order Nrj. Thus, the smallest possible scale on which the empirical density may be 
close to a deterministic density (in our case the semicircle law) is 77 ^ 1/N . If we characterize the empirical 
spectral density around an energy E on scale 77 by its Stieltjes transform, miy(z) = N~^Ti{H — z)~^ for 
z = E + iri, then the local semicircle law around the energy E and in a spectral window of size rj is essentially 
equivalent to 

\m^{z)-m{z)\ = 0(1) (1.2) 

as TV — >■ cx), where m{z) is the Stieltjes transform of the semicircle law. For any 77 ^ 1/A^ (up to logarithmic 
corrections) the asymptotics (1.2) in the bulk spectrum was first proved in [o] for Wigner matrices. The 



optimal error bound of the form 0{{Nr/) ^) (with correction) was first proved in 15 for the bulk 



spectrum. This result was then extended to the spectral edges in 16 . Here the identical distribution of the 



entries of H was not required, but the upper bound in (1.1 ) on the variances was necessary. 

Band matrices in d dimensions with band width W satisfy the weaker bound Sij ^ C/W^. (Note that 
the band width W is typically much smaller than the linear size L of the configuration space T, i.e. the 
bound W~'^ is much larger than the inverse number of lattice sites, L^'' — \T\^^ — N^^.) This motivates 
us to consider even more general matrices, with the sole condition 

s,, < C/M (1.3) 



on the variances (instead of (1.1 )). Here M is a new parameter that typically satisfies M <^ N. (From now 



on, the relation A B for two A^-dependent quantities A and B means that A ^ N ^B for some positive 



e > .) The question of the validity of the local semicircle law under the assumption (1.31 was initiated 
, where (1.2) was proved with an error term of order (Mrj)"^^^ away from the spectral edges. 



m 
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The purpose of this paper is twofold. First, we prove a local semicircle law (1.2), under the variance 



condition (1.3), with a stronger error bound of order {AIrj)~ , including energies E near the spectral edge. 
Away from the spectral edge (and from the origin E = if the matrix does not have a band structure), the 
result holds for any rj ^ 1/M. Near the edge there is a restriction on how small 7/ can be. This restriction 
depends explicitly on a norm of the resolvent of the matrix of variances, S — (s^); we give explicit bounds 
on this norm for various special cases of interest. 

As a corollary, we derive bounds on the eigenvalue counting function and rigidity estimates on the 
locations of the eigenvalues for a general class of matrices. Combined with an analysis of Dyson Brownian 
motion and the Green function comparison method, this yields bulk universality of the local eigenvalue 
statistics in a certain range of parameters, which depends on the matrix S. In particular, we extend bulk 



universality, proved for generalized Wigner matrices in 14 , to a large class of matrix ensembles where the 



upper and lower bounds on the variances (1.1 ) are relaxed. 

The main motivation for the generalizations in this paper is the Anderson transition for band matrices 
outlined above. While not optimal, our results nevertheless imply that band matrices with a sufficiently 
broad band plus a negligible mean-field component exhibit bulk universality: their local spectral statistics 
are governed by random matrix statistics. For example, the local two-point correlation functions coincide if 
W ^ iV^'^/'^^. Although eigenvector delocalization and random matrix statistics are conjectured to occur in 
tandem, delocalization was actually proved in |!5i under more general conditions than those under which we 
establish random matrix statistics. In fact, the delocalization results of 5 hold for a mean-field component 
as small as (N/W^)^^"^, and, provided that W ^ N^/^ ^ the mean-field component may even vanish (resulting 
in a genuine band matrix). 

The second purpose of this paper is to provide a coherent, pedagogical, and self-contained proof of the 
local semicircle law. In recent years, a series of papers [3l[8} |Tol[T4}|T6] , with gradually weaker assumptions, 



3 



was published on this topic. These papers often cited and rehed on the previous ones. This made it 
difficult for the interested reader to follow all the details of the argument. The basic strategy of our proof 
(that is, using resolvents and large deviation bounds) was aheady used in [3|[8} fl0l[T4} |l6]. In this paper 



we not only streamline the argument for generalized Wigner matrices (satisfying (1.1)), but we also obtain 
sharper bounds for random matrices satisfying the much weaker condition ( |1.3| . This allows us to establish 
universality results for a class of ensembles beyond generalized Wigner matrices. 

Our proof is self-contained and simpler than those of [3 14 ■ 16 . In particular, we give a proof of the 
Fluctuation Averaging Theorem, Theorems |4.6| and |4.7| below, which is considerably simpler than that of its 
predecessors in [3l|15||16 



In addition, we consistently use fluctuation averaging at several key steps of the 
main argument, which allows us to shorten the proof and relax previous assumptions on the variances Sij. 
The reader who is mainly interested in the pedagogical presentation should focus on the simplest choice of 
S, Sij = l/N, which corresponds to the standard Wigner matrix (for which M = N), and focus on Sections 
[2l|4l[5l and[6l as weh as Appendix |B| 

We conclude this section with an outline of the paper. In Section [2] we define the model, introduce basic 
definitions, and state the local semicircle law in full generality (Theorem |2.3[ ) . Section [s] is devoted to some 
examples of random matrix models that satisfy our assumptions; for each example we give explicit bounds 
on the spectral domain on which the local semicircle law holds. Sections |4j [5] and [6] are devoted to the proof 
of the local semicircle law. Section |4] collects the basic tools that will be used throughout the proof. The 
purpose of Section [5] is mainly pedagogical; in it, we state and prove a weaker form of the local semicircle 
law, Theorem |5.1 1 The error bounds in Theorem |5.1| are identical to those of Theorem |2.3[ but the spectral 
domain on which they hold is smaller. Provided one stays away from the spectral edge, Theorems |5 . 1 1 and |2 . 3| 
are equivalent; near the edge. Theorem 2.3 is stronger. The proof of Theorem 5.1 is very short and contains 
several key ideas from the proof of Theorem 2.3 The expert reader may therefore want to skip Section [5] 



but for the reader looking for a pedagogical presentation we recommend first focusing on Sections |4] and [5] 
(along with Appendix [b|) . The full proof of our main result, Theorem 2.3 is given in Section [6j In Sections 
[7] and [8] we draw consequences from Theorem |2.3[ In Section [7] we derive estimates on the density of states 
and the rigidity of the eigenvalue locations. In Section [8] we state and prove the universality of the local 
spectral statistics in the bulk, and give applications to some concrete matrix models. In Appendix [A] we 
derive explicit bounds on relevant norms of the resolvent of S (denoted by t he a bst ract control parameters 
r and r), which are used to define the domains of applicability of Theorem s |2 . 3| an d |5 . 1 [ Finally, Appendix 
[B]is devoted to the proof of the fluctuation averaging estimates. Theorems |4.6| and |4.7[ 

We use C to denote a generic large positive constant, which may depend on some fixed parameters and 
whose value may change from one expression to the next. Similarly, we use c to denote a generic small 
positive constant. 



2. Definitions and the main result 



Let {hij : i ^ j) be a family of independent, complex-valued random variables hij = h^j satisfying Ehij ~ 
and hii £ E for all i. For i > j we define hij :— hji, and denote hy H = = {hij)^j=i the N x N matrix 
with entries hij. By definition, H is Hermitian: H = H* . We stress that all our results hold not only for 
complex Hermitian matrices but also for real symmetric matrices. In fact, the symmetry class of H plays 
no role, and our results apply for instance in the case where some off-diagonal entries of H are real and 
some complex- valued. (In contrast to some other papers in the literature, in our terminology the concept of 
Hermitian simply refers to the fact that H = H* .) 
Wc define 

:= E|/i,jf , M = Mn (2.1) 

In particular, we have the bound 

Sy- (2.2) 
for all i and j. We regard N as the fundamental parameter of our model, and M as a function of N . We 
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introduce the N x N symmetric matrix S = Sn = {sij)fj^i- We assume that S is (doubly) stochastic: 



= 1 



(2.3) 



for all i. For simplicity, we assume that S is irreducible, so that 1 is a simple eigenvalue. (The case of 
non-irreducible S may be trivially dealt with by considering its irreducible components separately.) We shall 
always assume the bounds 

M N (2.4) 

for some fixed 5 > 0. 

It is sometimes convenient to use the normalized entries 



(2.5) 



which satisfy EQj = and EjCijP = 1- (If Sij = we set for convenience dj to be a normalized Gaussian, 
so that these relations continue hold. Of course in this case the law of is immaterial.) We assume that 
the random variables Qj have finite moments, uniformly in N, i, and j, in the sense that for all p G N there 
is a constant /ip such that 

mA" < (2.6) 

for all N, i, and j. We make this assumption to streamline notation in the statements of results such as 
Theorem 2.3 and the proofs. In fact, our results (and our proof) also cover the case where (2.6) holds for 



some finite large p; see Remark 2.4 



Throughout the following we use a spectral parameter z G C satisfying Imz > 0. We use the notation 

z = E + irj 



without further comment, and always assume that ij > 0. Wigner semicircle law g and its Stieltjes transform 
m are defined by 

27r 



g{x) := ^7(4^^ 



m(z) 



■ dx . 



X — z 



(2.7) 



To avoid confusion, we remark that m was denoted by rUsc in the papers (3)|4)[8}|12)[14}{16], in which m had 
a different meaning from (2.7). It is well known that the Stieltjes transform m is the unique solution of 



m{z) 

satisfying Inim(z) > for Imz > 0. Thus we have 



m{z) 



m(z) 



z = 



(2.8) 



(2.9) 



Some basic estimates on m are collected in Lemma 14.31 below. 
An important parameter of the model 

Tn{z) = r(z) (l-m(z)2^)-' 



(2.10) 



A related quantity is obtained by restricting the operator (l — m(z)^5) ^ to the subspace e-'- orthogonal to 
the constant vector e := iV^^/^(l, 1, . . . , 1)*. Since S is stochastic, we have the estimate — 1 < < 1 and 1 
is a simple eigenvalue of S with eigenvector e. Set 



rjv(z) = r(z) := {i-m{zfsy 



(2.11) 



the norm of (1 — m(z)^S') ^ restricted to the subspace ortho gonal to the constants. Clearly, r(z) ^ r(z). 
Basic estimates on T and F are collected in Proposition A. 2 below. Many estimates in this paper depend 



-"^Here we use the notation || A||foo_^foo = max^ for the operator norm on £°°(C^). 
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critically on T and F. Indeed, these parameters quantify the stability of certain self-consistent equations 
that underlie our proof. However, F and F remain bounded (up to a factor logiV) provided E = Kez is 



separated from the set {—2,0,2}; for band matrices (see Example 3.2) it suffices that E be separated from 
the spectral edges {—2, 2}; see Appendix [A} At a first reading, we recommend that the reader neglect F and 
r (i.e. replace them with a constant). For band matrices, this amounts to focusing on the local semicircle 
law in the bulk of the spectrum. 

We define the resolvent or Green function of H through 

G{z) := {H-z)-\ 

and denote its entries by Gij{z). The Stieltjes transform of the empirical spectral measure of H is 

mN{z) ^TrG(z). (2.12) 

The following definition introduces a notion of a high-probability bound that is suited for our purposes. 
It was introduced (in a slightly different form) in [6j. 

Definition 2.1 (Stochastic domination). Let 

X (X W {u):Nen,ue L/f^)) , Y = (F^ {u):N e^^ue C/^) 

he two families of nonnegative random variables, where J7^^^ is a possibly N -dependent parameter set. We 
say that X is stochastically dominated by uniformly in u, if for all (small) e > and (large) D > we 
have 



sup I 



X^^\u) > N'Y^^\u) 



for large enough N ^ No{e,D). Unless stated otherwise, throughout this paper the stochastic domination 



will always be uniform in all parameters apart from the parameter S in (2.4) and the sequence of constants 
fip in ( |2.6[ ); thus, No{e,D) also depends on S and fip. If X is stochastically dominated by Y, uniformly in 
u, we use the notation X ^Y. Moreover, if for some complex family X we have \X\ -< Y we also write 
X^O^iY). 



For example, using Chebyshev's inequality and (2.6) one easily finds that 



\h^A -< {s^,y/^ -< M-''\ (2.13) 

so that we may also write hij — 0^((sij)^/^). Another simple, but useful, example is a family of events 
S = S'^) with asymptotically very high probability: If P(S^) ^ for any D > and N > No{D), then 

the indicator function 1(S) of S satisfies 1 — 1(S) -< 0. 

The relation ^ is a partial ordering, i.e. it is transitive and it satisfies the familiar arithmetic rules of 
order relations. For instance if Xi ^ Yi and X2 ^ I2 then Xi + X2 ^ Yi + Y2 and ^1X2 ^ Y1Y2. More 



general statements in this spirit are given in Lemma 4.4 below. 
Definition 2.2 (Spectral domain). We call an N -dependent family 

D = D*^) C {z : \E\ ^ 10, M-^ ?7 10} 



a 



spectral domain. (Recall that M = Mj^ depends on N .) 



In this paper we always consider families X^^\u) — X^'^^z) indexed by u = {z,i), where z takes on 
values in some spectral domain D, and i takes on values in some finite (possibly A'^-dependent or empty) 
index set. The stochastic domination X < Y oi such families will always be uniform in z and i, and we 
usually do not state this explicitly. Usually, which spectral domain D is meant will be clear from the context, 
in which case we shall not mention it explicitly. 



In this paper we shall make use of two spectral domains, S defined in (5.2) and S defined in (2.17) 



Our main result is formulated on the larger of these domains, S. In order to define it, we introduce an 
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-B-dependent lower boundary rjE on the spectral domain. We choose a (small) positive constant 7, and 
define for each E e [-10, 10] 



mm< r] 



1 r M-'-' 



^ mm< 

Mr] tr(z)3 ' r(z)4lm m(z) 



for all z e [E + ir],E + lOi] 



(2.14) 



Note that r]E depends on 7, but we do not explicitly indicate this dependence since we regard 7 as fixed. 
At a first reading we advise the reader to think of 7 as being zero. Note also that the lower bound in (A. 3) 
below implies that tje ^ M~^. We also define the distance to the spectral edge, 



K = He \\E\ 
Finally, we introduce the fundamental control parameter 



n(z) 



llmm{z) 
Mr] 



1 



(2.15) 



(2.16) 



which will be used throughout this paper as a sharp, deterministic upper bound on the entries^ of G. Note 
that the condition in the definition of Tje states that the first term of IT is bounded by M~T~'^ and the 
second term by M T^'^. We may now state our main result. 



Theorem 2.3 (Local semicircle law). Fix 7 e (0,1/2) and define the spectral domain 

S = SW(7) := {E + irj:\E\^10,JjE^V^W}. 

We have the bounds 



max Gij(z) 



5ijm{z)\ -< Tl{z) 



uniformly m z G S, as well as 



\m]\[{z) — m{z) -< 



1 

Jlr] 



uniformly m z G S. Moreover, outside of the spectrum we have the stronger estimate 

I / ^ / M 1 1 

' ' M[K + r]) [Mrj)'^^Jn + ri 

uniformly m z G S n {z : |F| ^ 2 , Mrj^/n + rj ^ M^}. 



(2.17) 
(2.18) 

(2.19) 
(2.20) 



We remark that the main estimate for the Stieltjes transform mjv is (2.191. The other estimate (2.201 is 



mainly useful for controlling the norm of H , which we do in Section [7j We also recall that uniformity for the 
spectral parameter z means that the threshold Nq{£, D) in the definition of -< is independent of the choice of 



z within the indicated spectral domain. As stated in Definition 2.1 this uniformity holds for all statements 



containing and is not explicitly mentioned in the following; all of our arguments are trivially uniform in 
z and any matrix indices. 



Remark 2.4. Theorem 2.3 has the following variant for matrix entries where the condition (2.6) is only 



imposed for some large but fixed p. More precisely, for any e > and D > Q there exists a constant p{e, D) 



such that if (2.6) holds for p — p(e, D) then 

¥{\mN{z) ~ m{z)\ > N^Mr^y^) < 



for all 2 G S and N > iVo(e, D). An analogous estimate replaces ( |2.18 ) and (2.20). The proof of this variant 
is the same as that of Theorem 12.31 



Remark 2.5. Most of the previous works [3p} {l0p!4}|l6) assumed a stronger, subexponential decay condition 
on Qj instead of (2.6). Under the subexponential decay condition, certain probability estimates in the 



results were somewhat stronger and precise tolerance thresholds were sharper. Roughly, this corresponds to 
operating with a modified definition of -< , where the factors N"^ are replaced by high powers of log N and 
the polynomial probability bound is replaced with a subexponential one. The proofs of the current 

paper can be easily adjusted to such a setup, but we shall not pursue this further. 
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A local semicircle law for Wigner matrices on the optimal scale rj > 1/N was first obtained in j9j. The 



optimal error estimates in the bulk were proved in [15], and extended to the edges in 16 . These estimates 
underlie the derivation of rigidity estimates for individual eigenvalues, which in turn were used in [16| to 
prove Dyson's conjecture on the optimal local relaxation time for the Dyson Brownian motion. 



Apart from the somewhat different assumption on the tails of the entries of H (see Remark 2.51, Theo- 



re m|2.3[ when restricted to generalized Wigner matrices, subsumes all previous local semicircle laws obtained 
in [8f |10[[T5|[l6] . For band matrices, a local semicircle law was proved in [m]. (In fact, in 14 the band struc- 
ture was not required; only the conditions (2.2 1, (2.3), and the subexponential decay condition for the matrix 
entries (instead of ( 2.6 )) were used.) Theorem 2.3 improves this result in several ways. First, the error bounds 



orem 2.1 of 



14 



in (|2.18| and (|2.19 1 are uniform in E, even for E near the spectral edge; the corresponding bounds in The- 
diverged as k^^. Second, the bound (2.19) on the Stieltjes transform is better than (2.16) 
by a fa ctor (Afyy)"^/^. This improvement is due to exp loiting the fluctuation averaging mechanism of 
Third, the domain of 77 for which Theorem 2^ applies is essentially 77 ^ K~^/^Af~^, which is 
somewhat larger than the domain 77 ^ k^^M^^ 



Theorem 



of 



14 



While Theorem |2.3| subsumes several previous local semicircle laws, two previous results are not covered. 
The local semicircle law for sparse matrices proved in js] does not follow from Theorem |2.3[ However, the 
argument of this paper may be modified so as to include sparse matrices as well; we do not pursue this issue 
further. The local semicircle law for one-dimensional band matrices given in Theorem 2.2 of [5] is, however, 
of a very different nature, and may not be recovered using the methods of the current paper. Under the 
conditions W > N^/^ and tj > N^/W^, Theor em 2.2 of [H] shows that (focusing for simplicity on the 
one-dimensional case) 

1 1 



(2.21) 



in the bulk spectrum, which is stronger than the bound of order (Wr])~^^'^ in (2.18). The proof of (2.21) 
relies on a very general fluctuation averaging result from |6 , which is considerably stronger than Theorems 
|4.6| and [4.7[ see Remark |4 . 8| below . The key open problem for band matrices is to establish a local semicircle 
law on a scale 77 below W^^. The estimate (2.21) suggests that the resolvent entries should remain bounded 
throughout the range 77 > max{A^^^, W^^}. 

The local semicircle law. Theorem |2.3[ has numerous consequences, several of which are formulated in 
Sections [7] and [8] Here we only sketch them. Theorem |7.5| states that the empirical counting function 
converges to the counting function of the semicircle law. The precision is of order provided that we 

have the lower bound s^- ^ c/N for some constant c > 0. As a consequence. Theorem |7. 6| st ates that the 
bulk eigenvalues are rigid on scales of order M^^. Under the same condition, in Theorem |8.2| we prove the 
universality of the local two-point correlation functions in the bulk provided that M ^ 7V^W34. obtain 
similar results for higher order correlation functions, assuming a stronger restriction on M. These results 
generalize the earlier theorems from [3|[4{[l6] , which were valid for generalized Wigner matrices satisfying the 
condition (1.1), under which Af is comparable to N. We obtain similar results if the condition Sij ^ c/N in 
(1.1) is relaxed to Sij ^ N^^^^ with some small ^. The exponent ^ can be chosen near 1 for band matrices 
with a broad band W >i N. In particular, we prove universality for such band matrices with a rapidly 
vanishing mean-field component. These applications of the general Theorem 8.2 are listed in Corollary |8.3[ 



3. Examples 



In this section we give some important example of random matrix models H. In each of the examples, we 
give the deterministic matrix S = (sij) of the variances of the entries of H. The matrix H is then obtained 
jQj. Here (Cij) is a Hermitian matrix whose upper-triangular entries are independent and 



from hij = s,, 



whose diagonal entries are real; moreover, we have EQj = 0, E|Cijl = 1; the condition (2.6) for all p, 
uniformly in TV, i, and j. 

Definition 3.1 (Full and flat Wigner matrices). Let a 



ajv and h = hiq he possibly N -dependent 



positive quantities. 



We call H an a-fuU Wigner matrix if S satisfies (2.3) and 

a 

N ' 



(3.1) 
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Similarly, we call H a b-Hat Wigncr matrix if S satisfies (2.3) and 

b 

(Note that in this case we have M ^ N/b.) 

If a and b are independent of N we call an a-full Wigner matrix simply full and a b-flat Wigner matrix 
simply flat. In particular, generalized Wigner matrices, satisfying are full and flat Wigner matrices. 

Definition 3.2 (Band matrix). Fix d e N. Let f be a bounded and symmetric (i.e. f{x) — fi^x)) 
probability density on M''. Let L and W be integers satisfying 

for some fixed 5' > 0. Define the d-dimensional discrete torus 

Ti = [-L/2,L/2Yr\Z'^ . 

Thus, has N ^ L'^ lattice points; and we may identify with {1,...,A^}. We define the canonical 
representative of i E through 

\i]L ■■= (^ + LZ'^)nTi. 
Then H is a d-dimensional band matrix with band width W and profile function f if 



Zl \ W 



where Zl is a normalization chosen so that (2.3 I holds. 

Definition 3.3 (Ba nd matrix with a mean-field component). Let Hb a d-dimensional band matrix 
from Definition 3.2 Let Hw be an independent a-full Wigner matrix indexed by the set T^. The matrix 
H := \J\ — vHb + \pyliw : with some v G [0, 1], is called a band matrix with a mean-field component. 

The example of Definition |3.3| is a mixture of the previous two. We are especially interested in the case 
^ 1, when most of the variance comes from the band matrix, i.e. the profile of S is very close to a sharp 
band. ^ 

We conclude with some explicit bounds for these examples. The behaviour of F and F near the spectral 
edge is governed by the parameter 

(n+^2= if ISU 2 
n = := } ' ' (3.2) 



{^^^ ii\E\ >2, 

where we set, as usual, k = ke and z = E + irj. Note that the parameter may be bounded from below 
by (Imm)^. The following results follow immediately from Propositions A. 2 and A. 3 in Appendix [a} They 
hold for an arbitrary spectral domain D. 

(i) For general H and any constant c > 0, there is a constant C > such that 

C"^ F F ClogTV 

provided dist(i;, {-2, 0, 2}) > c. 

(ii) For a full Wigner matrix we have 



where C depends on the constant a in Definition 3.1 but c does not. 



(iii) For a band matrix with a mean-field component, as in Definition 3.3 we have 

^ ~ ^ ClogA^ 

" " (VF/L)2 + „a + e' 

The case 1^ = corresponds to a band matrix from Definition |3.2| 
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4. Tools 



In this subsection we collect some basic facts that will be used throughout the paper. For two positive 
quantities and we use the notation A^v x to mean cAj^ ^ Bjf ^ CAjf. Throughout the 
following we shall frequently drop the arguments z and TV, bearing in mind that we are dealing with a 
function on some spectral domain D. 

Definition 4.1 (Minors). For T c {1, . . . , iV} we define H'-'^^ by 

:= l(z^T)lO-^T)/z„-. 
Moreover, we define the resolvent of H'^'^^ through 

We also set 

(T) 

WheuT — {a}, we abbreviate ({a}) by (a) in the above definitions; similarly, we write [ah) instead oj {{a,h}). 

Definition 4.2 (Partial expectation and independence). Let X = X{H) he a random variable. For 
i € {1, . . . , N} define the operations Pi and Qi through 

P,X E(X|i/«), Q,X X-P,X. 

We call Pi partial expectation in the index i. Moreover, we say that X is independent of T C {1, . . . , N} if 
X = PiX for all i e T. 

We introduce the random z-dependent control parameters 
Ao max|Gij|, A<j := maxjCij - m| , A := max{Ao,Ad}, 6 {m^ - m\ . (4.1) 



We remark that the letter A had a different meaning in several earlier papers, such as 16 . The following 
lemma collects basic bounds on m. 

Lemma 4.3. There is a constant c > such that for E G [—10, 10] and rj e (0, 10] we have 

c < lm(z)l sC 1 - cr/, (4.2) 
\l-m'{z)\ X y;^, (4.3) 



well 



/k+V 



Proof. The proof is an elementary exercise using (2.9 1. □ 



In particular, recalling that —1 ^5^1 and using the upper bound \m\ ^ C from (4.2), we find that 
there is a constant c > such that 

c < F < r. (4.5) 

The following lemma collects basic algebraic properties of stochastic domination ^. Roughly, it states 
that -< satisfies the usual arithmetic properties of order relations. We shall use it tacitly throughout the 
following. 

Lemma 4.4. (i) Suppose that X{u,v) -< Y{u,v) uniformly in u E U and v E V. If \V\ ^ N'~^ for some 
constant C then 

^X(w,t;) < ^r(ti,«) 
vev vev 

uniformly in u. 
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(ii) Suppose that Xi{u) -< Yi{u) uniformly in u and X2{u) -< Y2{u) uniformly in u. Then Xi{u)X2{u) -< 
Yi{u)Y2{u) uniformly in u. 



(Hi) If X < Y + aX for some deterministic constant a G (0, 1) then X ^Y . 
Proof. The claims follow from a simple union bound. 



□ 



The following resolvent identities form the backbone of all of our calculations. The idea behind them is 
that a resolvent matrix element G^- depends strongly on the i-th and j-th columns of H, but weakly on all 
other columns. The first identity determines how to make a resolvent matrix element Gij independent of an 
additional index k ^ i,j. The second identity expresses the dependence of a resolvent matrix element Gij 
on the matrix elements in the i-th or in the j-th column of H. 



Lemma 4.5 (Resolvent identities). For any Hermitian matrix H and T C {!,.. 
identities hold. If i,j, fc ^ T and i,j^k then 



(T) 



G 



// i, J ^ T satisfy i ^ j then 



G 



(Tfe) 



r^(T)^(T) 



G 



G 



(T) 



(T) 
kk 



(T») 



G 



{^^) 
kj 



G 



(Tfc) 



r^(T)^(T) 
'-'ik '-'ki 

^kk 



, , N} the following 



(4.6) 



k 



(4.7) 



Proof. This is an exercise in linear algebra. The first identity (4.6) was proved in Lemma 4.2 of [14] and 
the second is an immediate consequence of the first. The identity (4.7) is proved in Lemma 6.10 of [4^ □ 

Our final tool consists of the following results on fluctuation averaging. They exploit cancellations in sums 
of fluctuating quantities involving resolvent matrix entries. A very general result was obtained in 6|; in this 
paper we state a special case sufficient for our purposes here, and give a relatively simple proof in Appendix 
[B] We consider weighted averages of diagonal resolvent matrix entries Gkk- They are weakly dependent, but 
the correlation between Gkk and Gmm for m 7^ fc is not sufficiently small to apply the general theory of sums 
of weakly dependent random variables; instead, we need to exploit the precise form of the dependence using 
the resolvent structure. 

It turns out that the key quantity that controls the magnitude of the fiuctuations is A. However, being 
a random variable, A itself is unsuitable as an upper bound. For technical reasons (our proof relies on 
a high-moment estimate combined with Chebyshev's inequality), it is essential that A be estimated by a 
deterministic control parameter, which we call ^. The error terms are then estimated in terms of powers of 
^. We shall always assume that ^ satisfies 



(4.8) 



in the spectral domain D, where c > is some constant. We shall perform the averaging with respect to a 
family of weights T = (tik) satisfying 



< t.fc ^ M- 



tik — 1 • 



(4.9) 



Typical example weights are tik — Sik and tik = N ^- Note that in both of these cases T commutes with S. 
We introduce the average of a vector {ai)fLi through 



1 



N ^ 



(4.10) 



Theorem 4.6 (Fluctuation averaging) . Fix a spectral domain D and a deterministic control parameter 
^ satisfying (4.8). Suppose that A ^ ^' and the weight T = (tik) satisfies (4.9). Then we have 



y^tikQk-p:, — 



0^(*'): 



E 

k 



tikQkG 



kk 



o^(*2). 



(4.11) 
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// in addition T commutes with S then 



kVk 



(4.12) 



where we defined Vi '■— Gu — m. The estimates (4.111 and (4.121 are uniform in the index i 



In fact, the first bound of (4.11) can be improved as follows 



Theorem 4.7. Fix a spectral domain D deterministic control parameters ^ and ^o, both satisfying (4.8) 
Suppose that A ^ Ag ^ \?o, and that the weight T — (tik) satisfies (4.9). Then 



, ^kk 

k 



(4.13) 



Remark 4.8. The first instance of the fluctuation averaging mechanism appeared in for the Wigner 
case, where [Z] = was proved to be bounded by A^. Since Qk[Gkk]~^ is essentially Zj^. (see ( |5.6[ ) 

below), this corresponds to the first bound in (4.11 ). A different proof (with a better bound on the constants) 
was given in |16| . A conceptually streamlined version of the original proof was extended to sparse matrices [s] 
and to sample covariance matrices 21 . Finally, an extensive analysis in p] treated the fluctuation averaging 
of general polynomials of resolvent entries and identified the order of cancellations depending on the algebraic 
structure of the polynomial. Moreover, in [6] an additional cancellation effect was found for the quantity 
Qi\Gij\'^. These improvements played a key role in obtaining the diffusion profile for the resolvent of band 



matrices and the estimate (2.21) in [5]. 

All proofs of the fluctuation averaging theorems rely on computing expectations of high moments of the 
averages, and carefully estimating the resulting terms. In a diagrammatic representation was developed 
for bookkeeping such terms, but this is necessary only for the case of general polynomials. For the special 
cases given in Theorem |4.6[ the proof is relatively simple and it is presented in Appendix iBl 



5. A SIMPLER PROOF USING T INSTEAD OF F 



In this section we prove the following weaker version of Theorem 2.3 In analogy to (2.14), we introduce the 
lower boundary 



1 . r M-y M-2T 

rjE ■— mm< rj : ^ mm^ 



Mtj iF(z)3 ' T{z)^lmm{z) 

Theorem 5.1. Fix 7 e (0, 1/2) and define the spectral domain 

S = S(^)(7) {E + irj:\E\^10,7^E^V^W} 

We have the bounds 



for aU z e [E + iri,E + lOi] 



\Gij{z) - S.ijm{z)\ ^ 11(2;) 



uniformly in i,j and z €z S, as well as 



uniformly in z ^ S. 



\m]\[{z) — m{z) I < 



Mr] 



(5.1) 

(5.2) 
(5.3) 

(5.4) 



Note that the only difference between Theorems 



2.3 



and 



5.1 



F in the deflnition of the threshold 77^ and the spectral domain, so that 

1 

M 



is that F was replaced with the larger quantity 
^ Ve ^ Ve , S C S. (5.5) 
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Hence Theorem 5.1 is indeed weaker than Theorem |2.3[ since it holds on a smaller spectral domain. As 
outlined after (2.11) and discussed in detail in Appendix [A] Theorems 5.1 and 2.3 are equivalent provided 
E is separated from the spectral edges ±2. 

The rest of this section is devoted to the proof of Theorc m|5.1| We give the full proof of Theorem 5.1 for 
pedagogical reasons, since it is simpler than that of Theorem |2 . 3| but already contains several of its key ideas. 
Theorem |2 . 3| will be proved in Section [6) One big difference between the two proofs is that in Theorem |5.1| 
the main control parameter is A, while in Theorem |2.3| we have to keep track of two control parameters, A 
and the smaller O. 

5.1. The self-consistent equation. The key tool behind the proof is a self-consistent equation for the diagonal 
entries of G. The starting point is Schur's complement formula, which we write as 



1 

Gi, 



(i) 

z - ^h^kG^^^hu ■ 

k,l 



(5.6) 



The partial expectation with respect to the index i (see Definition 4.2) of the last term on the right-hand 
side reads 



k,l 



GikGhi 
Gii 



— ^ sikGkk — 



GikGki 



Sik- 



Gi' 



where in the first step we used (2.1 ) and in the second (4.6|. Introducing the notation 

Vi G,i - m 



and recalling (2.3), we therefore get from (5.6) that 

1 

= -z-m 

where we introduced the fluctuating error term 



k 



(5.7) 



GikG 



Sik- 



ik'^ki 

Gii 



Zi := Qi'^hikG^^}hi, 



k,l 



(5.8) 



Using (2.8 1, we therefore get the self- consistent equation 



m + Vi 



1 

m 



(5.9) 



Notice that this is an equation for the family {vi)fLi, with random error terms T^. 

Self-consistent equations play a crucial role in analysing resolvents of random matrices. The simplest one 
is the scalar (or first level) self- consistent equation for mN{z), the Stieltjes transform of the empirical density 
(2.12). By averaging the inverse of ( |5.7| and neglecting the error terms, one obtains that toat approximately 
satisfies the equation m = — (m + z)^^, which is the defining relation for the Stieltjes transform of the 
semicircle law (2.8). 

, allows one to control not only 
first appeared in |14 , where a 



The vector (or second level) self- consistent equation^ as given in (5.9 
fluctuations of tojv — 'm but also those of Ga — m. The equation (5.9 
systematic study of resolvent entries of random matrices was initiated. 

For completeness, we mention that a matrix (or third level) self-consistent equation for local averages of 
jGijp, was introduced in [s^. This equation constitutes the backbone of the study of the diffusion profile of 
the resolvent entries of random band matrices. 
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5.2. Estimate of the error in terms of A. 

Lemma 5.2. The following statements hold for any spectral domain D. Let (j) he the indicator function of 
some (possibly z-dependent) event. If 4>^ ~< M~'^ for some c > then 



</)(Ao + |Z,| + |T,|) -< 



I Im m + A 



Mr] 



(5.10) 



uniformly m z G D. Moreover, for any fixed (N -independent) rj > we have 

Ao + |Z,| + |T,| -< 

uniformly m z G {w G D : Iniw = rj}. 



(5.11) 



Proof. We begin with the first statement. We shall often use the fact that, by the lower bound of (4.2) 
and the assumption 0A -< M~'^, we have 

H\Gu\ -< 1. (5.12) 

First we estimate Zi, which we split as 





i:(i/...p-...)G« 




(i) 




k 




k^l 



(5.13) 



We estimate each term using the large deviation estimates from Theorem C.l by conditioning on G^*' 
and using the fact that the family (/iifc)^i is independent of G'-'-'. By (C.2), the first term of (5.13) is 

stochastically dominated by 0(X]fc'' |g[.2 |^) ^ M~^^'^, where we used the estimate (2.2) and 0|g[!^| -< 
1, as follows from (4.6), ( [5.12^ , and the assumption 0A -< Af""^. For the second term of (5.13) we apply 
dClt with au = s,'fGl;^s]7and Xk = C^k (see We find 



k,l kJ 



W|2 
kl I 



Imm + A 
Mt] 



(5.14) 



where in the last step we used (4.6) and (5.12). Thus we get 



I Im TO + A 



Mr] 



(5.15) 



where we absorbed the bound M ^1"^ on the first term of (5.13) into the right-hand side of (5.15), using 
Imm ^ rj as follows from (4.4). 



Next, we estimate Aq. We can iterate (4.7 1 once to get, for i ^ j, 

(0 / (y) 



(0 / (y) 

Gij = ^Gii y ' hikGf.^ = —GiiGjj j hij — ^ ^ 

k \ k,l 



hikG\.phij 



(5.16) 



The term hij is trivially 0^(Af ^/^). In order to estimate the other term, we invoke (C.3) with 
s^G^^ps]'^^, Xk = Ofc, and Yi = Oj- As in ^I^, we find 



k,l 



ImTO + A 
Mri 



Thus we find 



I Im TO + A 
Mr] 



(5.17) 
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where we again absorbed the term hij -< M into the right-hand side. 



In order to estimate Ai and ha in the definition of T^, we use (|5.12|) to estimate 



'Imm 
Mt] 



-< 



Im m + A 



Mr] 



where the second step follows from Imm ^ 77 (recall (4.4)). This completes the proof of (5.101. 



by the trivial deterministic bound ?/ ^. We omit the details. 



and iG^'fc^ 



are estimated 

□ 



The proof of (|5.11| is almost identical to that of (|5.10|. The quantities IG^.^ 

is to establish the following rough 



5.3. A rough bound on A. The next step in the proof of Theorem 
bound on A. 

Proposition 5.3. We have A -< M~^/^r^^ uniformly in S. 



5.1 



The rest of this subsection is devoted to the proof of Proposition [53] The core of the proof is a continuity 
argumen t. It s basic idea is to establish a gap in the rang e of A of the form 1(A Af-'^/^r-i) ^ M—'/^T-'^ 
below). In other words, for all z S S, with high probability either A ^ A/^'''/^r^^ or A ^ 
For z with a large imaginary part 77, the estimate A ^ M~'^/-^r~^ is easy to prove using a simple 



(Lemma 



5.4 



expansion (Lemma 5.5 below). Thus, for large 77 the parameter A is below the gap. Using the fact that A is 
continuous in z and hence cannot jump from one side of the gap to the other, we then conclude that with 
high probability A is below the gap for all z e S. See Figure [5T] for an illustration of this argument. 

Lemma 5.4. We have the bound 



1(A< M-''/'^r-i)A -< M-'^/^T- 



uniformly in S. 
Proof. Set 



l(A s$ Af-'^/^p-l). 

Then by definition we have 0A ^ M '/"^T^^ ^ CM^'^^'^, where in the last step we used (4.5). Hence we 
may invoke (5.101 to estimate A^ and Tj. In order to estimate A^, we expand the right-hand side of (5.9) 
in Vj to get 



- ^ SikVk + T-i 



where we used and that |?;,| ^ GM^''/'* on the event = 1}. Using KM we therefore have 

(jyivi- SikVk 



A 



I Im m + A 



Mr] 



We write the left-hand side as — m S)v]i with the vector v = (fi)fLi. Inverting the operator 1 — m S, 
we therefore conclude that 



0Ad = (j)ma.x\vi\ -< F A 



/ Im m -I- A 



Mr] 



Recalling (4.5) and (5.10), we therefore get 

(/)A -< (/iF( A^ 



' Im m + A 



Mr] 

Next, by definition of (j) we may estimate 

^FA^ ^ M^'^/^F^^ 



(5.18) 
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Moreover, by definitions of S and (j) we have 



/ Im m + A 



M-q 



' Imm 



Mt] 



Mr] 



Plugging this into (5.18) yields 0A -< M ^/^r ^, which is the claim. 



□ 



In order to start the continuity argument underlying the proof of Proposition |5.3[ we need the following 
bound on A for large 77. 

Lemma 5.5. We have A -< M^^/'^ uniformly in z e [—10, 10] + 2i. 
Proof. We shall make use of the trivial bounds 



1^^^- 1 ^ " 2 



1,1 1 

H ^ - = 7; 

■q 2 



From (5.11 ) we get 



(5.19) 
(5.20) 



Moreover, we use (4.6) and (5.16) to estimate 



33 I 



(y) 

h^J - ^ hikGl^i^hij 



-< 



where the last step follows using (C.3), exactly as the estimate of the right-hand side of (5.16) in the proof 
of Lemma [^2] We conclude that T^l -< M'^^^. 
Next, we write (5.9) as 

Using |to~1| ^ 2 and \vk\ ^ 1 as follows from ( |5.19[ ), we find 



+ ^ SikVk - T, 

k 



Using |m| < 1/2 we therefore conclude that 

Ad + 0^(A/-i/2) 



^ l + 0^(Af-i/2) 



Ad sc 



A. 



2 + 0^(Af-i/2) 



+ 0^(Af-i/2), 



from which the claim follows together with the estimate on Aq from (5.20). 



□ 



We may now conclude the proof of Proposition 5.3 by a continuity argument in = Im z. The gist of the 



continuity argument is depicted in Figure 5.1 



Proof of Proposition 15.31 Fix D > 10. Lemma \5A\ implies that for each z e S we have 

p(M-''/3r(z)-i s$ A(z) s$ M-''/^r{z)'^'^ < N-^ (5.21) 

for N ^ No, where A^o = No{'j, D) does not depend on z. 

Next, take a lattice A C S such that |A| ^ N^^ and for each z G S there exists a if G A such that 
\z — w \^N-^. Then \b.2l\ combined with a union bounds gives 



-D+10 



(5.22) 
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A 




Ve 



5.4 



Figure 5.1. The (77,A)-plane for a fixed E. The shaded region is forbidden with high probabihty by Lemma 
The initial estimate, given by Lemma 5.5 is marked with a black dot. The graph of A = A(_E + iry) is continuous and 
lies beneath the shaded region. Note that this method does not control A{E + irj) in the regime ri ^ t/e- 



for N ^ Nq. From the definitions of A(z), T{z), and S (recall (4.5)), we immediately find that A and F are 
Lipschitz continuous on S, with Lipschitz constant at most M 



Hence (5.22) implies 



3zeS : 2M-'-'/^T{z)-^ < A{z) < 2-H'r^/*T{z)-^) sC N-^+^" 



for N ^ Nq. We conclude that there is an event S satisfying P(S) ^ 1 — N ^+^^ such that, for each 
z e S, either l(S)A(z) ^ 2M~"'/^T{z)-'^ or l(S)A(z) > 2-iM-t/4F(z)-i. Since A is continuous and S is 
by definition connected, we conclude that either 



Vz e s : i(s)A(z) < 2M-'-'/^r{zy^ 



(5.23) 



Vz e S : l(S)A(z) ^ 2-Hl-''/^T{zy 



(5.24) 



(Here the bounds ( 5.23[ ) an d (|5.24 ) each hold surely, i.e. for every realization of A(z).) 

It remains to show that ( |5.24[ ) is impossible. In order to do so, it suffices to show that there exists a z G S 
such that A(z) < 2~^Af ~^/^ F(z)~-^ with probability greater than 1/2. But this holds for any z with Imz = 2, 
as follows from Lemma |5.5| and the bound F ^ Cri~^, which itself follows easily by a simple expansion of 
(1 — rn^Sj^^ combined with the bounds ||5||^oo_^^oo ^ 1 and (4.2). This concludes the proof. 



5.1 



5.4. Iteration step and conclusion of the proof of Theorem 

by deterministic control parameters 4* satisfying 



□ 



In the following a key role will be played 



(5.25) 



(Using the definition of S and (4.4) it is not hard to check that the upper bound in (5.251 is always larger 
than the lower bound.) Supp ose that A -< in S for some deterministic parameter ^ satisfying (5.25). For 
example, by Proposition 5.3 we may choose ^I' = Af~'>'/^F~^. 

We now improve the estimate A ^ \1/ iteratively. The iteration step is the content of the following 
proposition. 



Proposition 5.6. Let^! he a control parameter satisfying (5.25) and fix e ^ (0,7/3). Then 

A ^ * =^ A ^ F{^), 



(5.26) 
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where we defined 



I Im m 



Mr] M-q 



For the proof of Proposition |5.6| we need the following averaging result, which is a simple corollary of 
Theorem iMl 



Lemma 5.7. Suppose that A -< ^' for some deterministic control parameter 5* satisfying (4.8). Then [T] = 
0^(^^) (recall the definition of the average [■] from (4.10)J. 



Proof. The claim easily follows from Schur's complement formula (5.6 1 written in the form 

1 



We may therefore estimate [T] using the trivial bound \Ai\ -< as well as the fluctuation averaging bound 
from the first estimate of (4.11 ) with tik = l/N. □ 

Proof of Proposition 15.61 Suppose that A ^ 5* for some deterministic control parameter ^I^ satisfying 
(5.25). We invoke Lemma 5.2 with (j> = 1 (recall the bound (4.5)) to get 

A„ + |Z,| + |T,| ^ 



/ Im m + A 



Mr] 



Imm + ^ 



Mr] 



(5.27) 



Next, we estimate A^. Define the z-dependent indicator function 

:= 1(A Af ~^/4) . 



By (5.25), (4.5), and the assumption A ^ we have 1 — ip ^ 0. On the event {-ip — 1}, we expand the 



right-hand side of (5.9) to get the bound 



Using the fluctuation averaging estimate (4.12) as well as (5.27), we find 



/ Im TO + ^ 



Mr] 



(5.28) 



where we again used the lower bound from (4.5). Using 1 — t/j ^ we conclude 



which, combined with (5.27), yields 



Ad -< F*' 



A -< F*" 



/imTO + ^ 



Mr] 



I Im m + 



Mr] 



(5.29) 



(5.30) 



Using Young's inequality and the assumption ^ M '^/^T ^ we conclude the proof. □ 

For the remainder of the proof of Theorem |5.1| we work on the spectral domain S. We claim that 
if satisfies ( |5.25 ) then so does F{'^). The lower bound F{^) ^ cAT"^/^ is a consequence of the esti- 
mate ImTO/?7 ^ c, which follows from (4.4). The upper bound Ar~'*'/'^~^F~^ on the first term of F{'^) 
is trivial by assumption on ^! . Moreover, the second term of F{^) satisfies -^/im to/ {Mr]) ^ A/^'^'F^^ ^ 
CjV/-7r-i ^ m-t/3-'^F-i by definition of S and the lower bound (|45|. Similarly, the last term of F(*) 
satisfies M^/I^Mr]) ^ CM^-'^T'^ < M-^/^-ep-i definition of S. 



18 



We may therefore iterate (5.26). This yields a bound on A that is essentially the fixed point of the map 
i-> which is 11 (up to the factor Af^). More precisely, the iteration is started with '^o ■■= M~'^/^T~^; 

the initial hypothesis A -< ^Pq is provided by the rough bound from Proposition |5.3| For fc ^ 1 we set 
^'fe+i ■= ^(^'fe)- Hence from (5.261 we conclude that A ^ 4'^, for all k. Choosing k :— \e~^~\ yields 



A -< 



Since e was arbitrary, we have proved that 




which is (5.3). 



What remains is to prove (5.4), i.e. to estimate 8. We expand ( |5.9| on {ip = 1} to get 



SikVk 



(5.31) 



(5.32) 



Averaging in (5.32) yields 



V'm2(-H + [T]) = -^j[v]+0[ipJS?). 



By ( |5.31| ) and ( |5.27[ ) with * = 11, we have A + |Ti| -< 11. Moreover, by Lemma 
we get 

V'H = m^ip[v] + O^(tf) . 
Since 1 — ^ 0, we conclude that [i;] = m^[v] + 0-^{IP). Therefore 



5.7 



we have |[T]| -< H^. Thus 



-< 



Im m 1 

1 1 — m? I 1 1 — I A/ 77 J Mrj 



< 



C 



r 



A/r/ / M-q 



< 



c 

Mr] 



Here in the third step we used (4.3), (|4.4[), and the bound F ^ |1 — m \ which follows from the definition 



of F by applying the matrix (1 — m^S) to the vector e — N ^^^(1, 1, . . . , 1)*. The last step follows from 



the definition of S. Since 6 = \[v]\, this concludes the proof of (5.4), and hence of Theorem 5.1 



6. Proof of Theorem 2.3 



The key novelty in this proof is that we solve the self-consistent equation (5.9) separately on the subspace 
of constants (the span of the vector e) and on its orthogonal complement e-"-. On the space of constant 
vectors, it becomes a scalar equation for the average [v], which can be expanded up to second order. Near 



the spectral edges ±2, the resulting quadratic self-consistent scalar equation (given in (6.2) below) is more 
effective than its linearized version. On the space orthogonal to_the constants, we still solve a self-consistent 
vector equation, but the stability will now be quantified using F instead of the larger quantity F. 

Accordingly, the main control parameter in this proof is O = | [w] | , and the key iterative scheme (Lemma 
6.7 below) is formulated in terms of Q. However, many intermediate estimates still involve A. In particular, 
the self-consistent equation (5.9) is effective only in the regime where Vi is already small. Hence we need 
two preparatory steps. In Section 6.1 we will prove an apriori bound on A, essentially showing that A ^ 1. 



This proof itself is a continuity argument (see Figure 6.1 for a graphical illustration) similar to the proof of 
Proposition |5.3| now, however, we have to follow A and O in tandem. The main reason why Q is already 
involved in this part is that we work in larger spectral domain S defined using F. Thus, already in this 
preparatory step, the self-consistent equation has to be solved separately on the subspace of constants and 
its orthogonal complement. 

In Section [6. 2[ we control A in terms of Q, which allows us to obtain a self-consistent equation involving 
only Q. In this step we use the Fluctuation Averaging Theorem to obtain a quadratic estimate which, very 
roughly, states that A < 8 + A^ (see (6.201 below for the precise statement). This implies A < 8 in the 
regime A <C 1. 

Finally, in Section 6.3 we solve the quadratic iteration for 8. Since the corresponding quadratic equation 



has a dichotomy and for large 77 — Im z we know that 8 is small by direct expansion, a continuity argument 
similar to the proof of Proposition |5.3| will complete the proof. 
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6.1. A rough bound on A. In this section we prove the foUowing apriori bounds on both control parameters, 
A and 9. 

Proposition 6.1. In S we have the bounds 



Before embarking on the proof of Proposition |6.1[ we state some preparatory lemmas. First, we derive 
the key equation for [v] = u,, the average of Vi. 

Lemma 6.2. Define the z-dependent indicator function 

(f, 1(A < Af-'^/'^f (6.1) 

and the random control parameter 

Imm + O F 



9(0) 

Then we have 



y Mf] Mrj 

(/.((1-to2)[„] _^-i[^;]2^ = 0O^(g(e) + Af-T/4e2) (6.2) 
and 

(j)K -< Q + Tq{Q). (6.3) 

Proof. For the whole proof we work on the event {0 = 1}, i.e. every quantity is multiplied by (j). We 
consistently drop these factors (j) from our notation in order to avoid cluttered expressions. In particular, 
A ^ CM~'^/'^ throughout the proof. 

We begin by estimating Aq and A^ in terms of 9. Recalling (4.5 1, we find that cj) satisfies the hypotheses 
of Lemma 5.2 from which we get 



loo 1 /.N ^.x /imm + A _ 

A„ + |T,| -< r(A), r(A) := J . (6.4) 

In order to estimate A^, we expand the self-consistent equation ( |5.9[ ) (on the event {(p = 1}) to get 

-m^^s.feWfe = 0^(A2+r(A)); (6.5) 

k 

here we used the bound ( |6.4[ ) on |T,i|. Next, we subtract the average from each side to get 

{v, -[v])-m^Yl '^k{vk - H) = (A2 + r(A)) . 
fc 

Note that the average of the left-hand side vanishes, so that the average of the right-hand side also vanishes. 
Hence the right-hand side is perpendicular to e. Inverting the operator 1 — on the subspace e-*- therefore 
yields ^ 

|u, -[v]\ -< F(A2 +r(A)) . (6.6) 



Combining with the bound Aq -< r(A) from (6.4), we therefore get 

A -< e + fA2 + fr(A). (6.7) 

By definition of cj) we have FA^ ^ Af~'^/'*A, so that the second term on the right-hand side of (6.7) may be 
absorbed into the left-hand side: 

A -< e + Fr(A). (6.8) 

Now we claim that 

r(A) -< q{Q). (6.9) 
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If (6.9) is proved, clearly (6.3) follows from (6.8). In order to prove (6.9), we use (6.8) and Cauchy-Schwarz 
to get 

r{A) ^ 



llmm 
Ml] 



M?7 



-< 



' Im m 
Mt] 



'rr(A) 
Mr] 



'Imm 



e r 

+ Af-^r(A) + Ar 



Ml] 



for any £ > 0. We conclude that 



r(A) -< 



llmm 
Mt] 



' Q r 

h 

Ml] Mr] 



Since e > was arbitrary, (6.9) follows. 



Next, we estimate 8. We expand ( |5.9| to second order: 

fc 

In order to take the average and get a closed equation for [v], we write, using (|6.6| 



(6.10) 



(H 



MY 



■2H(«,-H) + o^(r2(A2 + r(A))- 



Plugging this back into (6.10) and taking the average over i gives 

-m2[w]+m2[T] = -H+m-i[i;]2 + 0^(A3+f2A'* + f2r(A)2 



Estimating [T] by max|Ti| -< r(A) (recall (6.4)) yields 

{l-m'^)[v]-m-^[v]^ = (r(A) + A^ + f ^A'' + f 2r(A)2 



By definitions of S and (j>, we have r^r(A) ^ 1. Therefore we may absorb the last error term into the first. 
For the second and third error terms we use (6.8) to get 

(1-to2)H -m-^H^ = 0^(r(A) + e3 + fV(A)3+f2e''+Pr(A)4 



In order to conclude the proof of (6.2), we observe that, by the estimates 8 ^ A ^ CM ''Z^, r^r(A) ^ 1, 
and A < A/^T/'^f^i, we have 



^ CM-'^/^e^ , rV(A)3 ^ r(A), r^e^ < r^A^e^ < m-^/^q^ , 

Putting everything together, we have 



r6r(A)-* 5$ r(A). 



Hence (6.2) follows from (6.9). 



□ 



Next, we establish a bound analogous to Lemma 5.4 establishing gaps in the ranges of A and 8. To that 
end, we need to partition S in two. For the following we fix e e (0, 7/12) and partition S — S> U S^, where 

S> := {zeS: y/i^+Tj > Ar(Mr/)^i/3| ^ — {zeS: ^/^T^ < M' {Mr])-^^^} . 



The bound relies on (6.2), whereby one of the two terms on the left-hand side of (6.2) is estimated in terms 
of all the other terms, which are regarded a s an error. In S> we shall estimate the first terr n on the left -hand 

summarizes the estimates on 8 of Lemma 



side of (6.2), and in the second. Figure 



6.1 



6.3 



and 



6.4 



We begin with the domain S> . In this domain, the following lemma roughly says that if 8 ^ M^^^ [Mif) ^/'^ 
and A M~^/'^T~'^ then we get the improved bounds 8 -< {Mif)'^/^, A ~< M~^/'^V~^, i.e. we gain a small 
power of M . These improvements will be fed into the continuity argument as before. 
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Figure 6.1. The (77, 0)-plane for a fixed E near the edge (i.e. with small k). The shaded regions are forbidden with 
high probability by Lemmas |6.3| and |6.4[ The initial estimate, given by Lemma [5.5| is marked with a black dot. The 
graph of 6 = Q{E + irj) is continuous, and hence lies beneath the shaded regions. 



Lemma 6.3. Let e G (0,7/12). Define the z-dependent indicator function 
and recall the indicator function (j> from (|6.1|). In S> we have the bounds 



3.11) 



Proof. From the definition of S> and (4.3 1 we get 



Therefore, on the event {(f>x — 1}, in ( |6.2[ ) we may absorb the second term on the left-hand side and the 
second term on the right-hand side into the first term on the left-hand side: 



Recalling |1 - m^l x ^K+ly (see (|43|), Imm 5^ C/kTv (see (|44|), \[v]\ =6, and the definition of 

S>, we get 



I Imm 



e 



Mt] y Mf] Mt] I 



What remains is to estimate A. From (6.3), the bound T"^ ^JlT^lm{Mr|) ^ ^ M ^ from the definition of S, 
and the estimate ^FO ^ t/iFA ^ 1 we get 



(^xA ^ 0x0 + A^"T"^ + ryr-i(Af77)-i + r2(Af?7)~i 



This concludes the proof. 



□ 
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Next, we establish a gap in the range of A, in the domain S^. To that end, we improve the estimate on 
A from A ^ Af t/'^T~^ to A ^ M'^~^/^T~^ as before. In this regime there is no need for a gap in Q, i.e. the 
continuity argument wiU be performed on the value of A only. 



Lemma 6.4. In S< we have the bounds 



(6.12) 



Proof. We write (6.2) as 



(j)[v]{l-m^ -m-^[v]) = (t)0^{q{e) + M—'/^e^). 
Solving this quadratic relation for [v], we get 



(/)e -< \l-m^\ + (l)\Jq{Q) + M-T/4e2 . (6.13) 
Using (|4.4[), the bound V ^ M~'</^{Mr]Y/'^ < {M-q)^/^ from the definition of S, and Young's inequality, we 



estimate 



^Jq{Q) + M-'r/4e2 s$ (Imm)i/4(Mr7)-i/-* + ei/''(M7;)-i/4 + ri/2(M7y)-i/2 + Af-T/8e 



Plugging this bound into (6.13), together with ( |4.3| and the definition of S^, we find 

06 -< + Ar{Mrjy^/^ 2Ar(A//?7)-i/^ 



This proves the first bound of (6.12). 

What remains is the estimate of A. From ( [q] ) and the bounds f < M-'^/3(^/^)i/3 g^^^ f^y/lmm{Mr])-^ < 
M~'^ from the definition of S, we get 



(j)A -< 08 + Af^'^r^i + ryr-i(M77)-i + r2(M77)^i 



This concludes the proof. 



□ 



We now have all of the ingredients to complete the proof of Proposition 6.1 



Proof of Proposition 16.11 The proof is a continuity argument similar to the proof of Proposition |5.3| 
In a first step, we prove that 

A -< M-'^/^f'^ e -< [Mri)-^/^ . (6.14) 



in S>. The continuity argument is almost identical to that following (5.21); the only difference is that we 
keep track of the two parameters A and Q. The required gaps in the ra nges of A and are provided by 
(6.11 ), and the argument is closed using the large-Ty estimate from Lemma 5.5 which yields 8 ^ A ^ M~^^'^ 
for ri^2. 

In a second step, we prove that 



-1/3 



in S^. This is again a continuity argument almost identical to that following (5.21 ). Now we establish a gap 
only in the range of A. The gap is provided by (6.12) (recall that by definition of e we have e — 7/3 < —7/4), 
and the argument is closed using the bound (6.14) at the boundary of the domains S> and S^. 

The claim now follows since we may choose e G (0,7/12) to be arbitrarily small. This concludes the 
proof of Proposition |6.1| □ 
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6.2. An improved bound on A in terms of 8. In (6.3) we already estimated A in terms of 8; the goal of 

this section is to improve this bound by removing the factor F from that estimate. We do this using the 
Fluctuation Averaging Theorem, but we stress that the removal of a factor F is not the main rationale for 
using the fluctuation averaging mechanism. Its fundamental use will take place in Lemma |6.6| below. A tech- 
nical consequence of invoking fluctuation averaging is that we have to use deterministic control parameters 
instead of random ones. Thus, we introduce a deterministic control parameter $ that captures the size of 
the random control parameter 8 through the relation 8 ^ $. Throughout the following we shall make use 
of the control parameter 

' 1 

M?7 ' 



p($) 



/ Im 7Ti + $ 

which differs from q{^) only by a factor F in the second term. 



Lemma 6.5. Suppose that A ^ 4* and 8 ^ $ m S for some deterministic control parameters 5* and <& 
satisfying 

cM-^'"^ ^ CM-'^I^T^^ , $ ^ CM-'^/'^T~^ . (6.15) 

Then 

K + \Z,\ < p{<i>), A^p($) + $. (6.16) 
the choice * = M-'</^f-^ and $ = {Mr^)'^/^ ^ M-T/^f-i satisfies 



We remark that, by Proposition 
the assumptions of Lemma |6.5[ 



6.1 



Proof of Lemma 16.51 Choosing cj) = \ in Lemma 5.2 and recalling (4.51, we get 

Ao + |T,;| < r(*), r{^) := 



I Im 771 + 'J 



Mri 



In order to estimate A^, as in ( 5.32[ ), we expand (5.9 1 to get 



(6.17) 



(6.18) 



As in the proof of (5.32) and (6.5), the expansion of (5.9) is only possible on the event {A ^ M~ } for some 
(5 > 0. By A ^ ^I* and (6.15 ), the indicator function of this event is 1 + 0^(0); the contribution 0^(0) of the 
complementary event can be absorbed in the error term 0^{^^). 

Subtracting the average N^^ from both sides of ( 6.18| ) and estimating by a constant (see ( |4.2[ )) 
yields 



^Sik{vk - M) - (Ti - [T] 



C»^(1'2) -< F*2 +r(*) 



(6.19) 



where in the last step we used the fluctuation averaging estimate (4.12) and |Tj| ^ r{^) from (6.17). Together 
with = 8 ^ <i> , this gives the estimate A^ -< F*^ + cf) + r{^). Combining it with the bound ( |6.17[ ), we 
conclude that 

A-<F^'2 + $ + r(*). (6.20) 

Now fix £ e (0,7/4). Using the assumption f* ^ CM^'</^ ^ M'", we conclude: if \1/ and $ satisfy ( |6.15[ ) 
then 

A ^ ^ A -< i^(*,$), (6.21) 

where we defined 



which plays a role similar to F(^>) in Proposition 



5.6 




/Imm 
Mr] ^ Mr; ' 

(Here we estimated ^/W(Mr])-^ in r(\I') by M^"^ 



IvF{Mrj)~'^ .) From (4.4) and the definition of S it easily follows that if (^',<I') satisfy (6.15) then so do 
{F{'^ , $),<!>). Therefore iterating (6.21 ) [e^^] times and using the fact that e G (0, 7/4) was arbitrary yields 



A 



/lm?7i 



1 



Mr/ Mr] 



(6.22) 
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This implies the claimed bomid (6.16 1 on A. Calling the right-hand side of (6.22) ^, we find 



Hence the claimed bound (6.16) on Aq and Zi follows from (6.17). 



(6.23) 

□ 



6.3. Iteration for and conclusion of the proof of Theorem 2.3 Next, we prove the following version of 
(5.9), which is the key tool for estimating 8. 

Lemma 6.6. Let $ he some deterministic control parameter satisfying ^ $ in S. Then 

(l-m^)[v]~m'^[vf = 0^(p($)2 + A/-^/'**^) . (6.24) 



Notice that this bound is stronger than the previous formula ( |6.2[ ), as the power of p(<i>) is two instead 
of one. The improvement is due to using fluctuation averaging in [T]. Otherwise the proof is very similar to 
that of K^. 



(6.25) 



Proof. By Proposition |6.1[ we may assume that 

$ ^ Af-T/4r-i 

since 9 ^ A ^ M^f^^f^^. From Lemma 



6.5 



we get Ao + \Zi\ -< p($) and A ^ '5, where 
* p($) + $ . 



By definition of S and K2^, we find that * sC 2M-'i/^r-^. 



Now we expand the right-hand side of (5.9) exactly as in (6.10) to get 



(6.26) 



(6.27) 



Using Theorem 4.7 and the bound A^ -< p($) from Lemma 6.5 we may prove, exactly as in Lemma 5.7 
that |[T]| -< p($)^Taking the average over i in (|6.27[) therefore yields 



(6.28) 



Using the estimates ( 6.19[ ) and (6.231, we write the quadratic term on the left-hand side as 



where we also used P^* ^ 2M as observed after ( 6.26| . From (6.28) we therefore get 

{l~m,^)[v]-m-'^[v]^ = 0^(p($)2 + M-T/%2j ^ 



The claim follows from (6.26). 

The bound on Q will follow by iterating the following estimate. 
Lemma 6.7. Fix e e (0,7/12) and suppose that ^ $ m S for some deterministic control parameter $. 
(i) If^^ AP^iMrj)-^ then 



□ 



(ii) If \E\ ^ 2, j^jf . $ < NF^/J^, and Mri^/i^ > NP' , then 



e < 



(6.29) 



(6.30) 
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Proof. We begin by partitioning S = S> U S^. This partition is analogous to the partition S = S> U 
from Section 6.1 and will determine which of the two terms in the left-hand side of (6.24) is estimated in 
terms of the others. Here 

S> := {z e S : y;^T^ > Af"''$} , := {z G S : 7^^+^ < M"^$}. 

We begin with the domain S^. Let if > be a constant large enough that 

, K. 2 1, , 

\/K + rj ^ — 11 — TO ||to|; 



such constant exists by (4.2) and (4.3). Define the indicator function 



(6.31) 



Hence on the event {ip = 1} we may absorb the quadratic term on the left-hand side of (6.24) into the linear 
term, to get the bound 



ImTO + $ 
Mr] 



1 



(Mr;) 2 



M-''/4$2 ^ (J _^ ivr-'^/'^* ^ CM-2^$ , (6.32) 

' Mr] 



where in the second step we used (4.4), the assumption (Mrj) ^ ^ M ■^^^ ^ <!>, and the definition of S' 



We conclude that in S' 



have 



^pe -< M-2e$ ^ M-^^/J^^Tv, (6.33) 
where in the last step we used the definition of S>. This means that there is a gap of order ^/k + r] between 



the bound in the definition of -0 in (6.31) and the right-hand side of (6.33). Moreover, by Proposition 6.1 



we have Q -< M ^ + r] for r] = 2. Hence a continuity argument on O, similar to the proof of Proposition 
iis) yields ([6^ in S>. 

Let us now consider the domain S^. We write the left-hand side of (6.24) as (1 — to^— TO^^[u])[i;]. Solving 
the resulting equation for [v], as in the proof of (6.131, yields the bound 

' 1 



9 -< |l-m^|+p 



I Im TO -t- $ 
Mr] 



Mr] 



^ CM-^^- 



M^ 
Mr] 



CM-^$, (6.34) 



where we used the definition of and the bounds (4.3) and (4.4 1. This proves (6.29) in S^, and hence 



completes the proof of part (i) of Lemma 6.7 



The proof of part (ii) is analogous. In this case we are in the domain S^, and use the estimate Imm ^ 
Cr]{K + ry)^^/2 from (4.4 1 instead of Im m ^ C^k + ?/ in (6.32 ). Using the other assumptions in part (ii), we 
have 

V-e ^ jjj-J^j===+CM-^^^ < Af-^V^T^, (6-35) 



{Mr]Y^ K + Tj 



which replaces (6.32) and (6.33). The rest of the argument is unchanged. 



□ 



Armed with Lemma 6.7 we may now complete the proof of Theorem 2.3 Fix e S (0,7/12). From 
Proposition 6.1 wc get that 9 ^ $o for $o — {Mvi)~^/^ + M'^^{Mr])~^ . Iteration of Lemma 6.7 therefore 
implies that, for all fc G N, wc have 9 -< where 



M^^ 



Mr] ' "' " "^yMr] 

Choosing k = [e^^] yields 9 -< M^%Mr)y^. Since e can be made as small as desired, we therefore obtain 
9 -< [Mt])-^. This is (pl9l). 



In the regime \E\ ^ 2, the same argument with the better iteration bound (6.30) yields (2.20). The 
iteration can be started with $o = M^^(Mry)~^ from ( |2.19[) . 

Finally, the bound A -< H in ( 2.18[ ) follows from (2.19) and Lemma 6.5 This concludes the proof of 
Theorem [231 
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7. Density of states and eigenvalue locations 



In this section we apply the local semicircle law to obtain information on the density of states and on the 
location of eigenvalues. The techniques used here have been developed in a series of papers [3{ |To{[l2|[T6) . 

The first result is to translate the local semicircle law, Theorem 2.3 into a statement on the counting 
function of the eigenvalues. Let Ai ^ A2 ^ • • • ^ Ajv denote the ordered eigenvalues of H, and recall the 



semicircle density g defined in (2.7). We define the distribution functions 

i-E 



n{E) 



1 

N 



{a : A„ s$ E}\ 



(7.1) 



for the semicircle law and the empirical eigenvalue density of H. Recall also the definition (2.15) of Kx for 



x e M and the definition (2.14) of for ^ 10. The following result is proved in Section 7.1 below 



Lemma 7.1. Suppose that (2.19) holds uniformly in z £ S, i.e. for \E\ ^ 10 and r]E ^ rj ^ 10 we have 

|to7v(z) — m{z)\ -< 
For given Ei < E2 in [—10, 10] we abbreviate 



1 

Mr] 



(7.2) 



r] := max{?7£; : E e [Ei,E2]}. 
Then, for — 10 ^ E'l < E2 ^ 10, we have 

{nN{E2) - UNiEi)) - {n{E2) - n{Ei)) 



(7.3) 



(7.4) 



The accuracy of the estimate (7.4) depends on F (see (A.3) for explicit bounds on F), since F determines 
rjE, the smallest scale on which the local semicircle law (Theorem 2.3) holds around the energy E. In the 
regime away from the spectral edges E = ±2 and away from E = 0, the parameter F is essentially bounded 
(see the example (i) from Section |3|; in this case tje >i (up to an irrelevant logarithmic factor). For E 

near 0, the parameter F blows up as E^"^, so that rjE ^ E^^'^M^^: however, if S has a positive ga p 5- at 
the bottom of its spectrum, F remains bounded in the vicinity oi E = Q (see (A.3)). See Definition A.l in 
Appendix |A] for the definition of the spectral gaps 5± . 

A typical example of S without a positive gap (5_ is a 2 x 2 block matrix with zero diagonal blocks, i.e. 
Sij = a i, j^L 01 L+l^i,j^N. In this case, the vector v = (1, 1, ... 1, —1, —1, ... — 1) consisting of 
L ones and N — L minus ones satisfies S'v = — v, so that —1 is in fact an eigenvalue of S. Since at energy 



_B = we have m^(z) = m'^{iri) — —1 + 0{r]), the inverse matrix (1 — m?S) 



becomes singular as — >■ 0. Thus, F(z7]) ^ rj , and the estimates leading to Theorem 
The corresponding random matrix has the form 



even aft er re stricting it to e-'- , 
become unstable. 



2.3 



H = 



A 

A* 



where A is an L x (N — L) rectangular matrix with independent centred entries. The eigenvalues of H are the 
square roots (with both signs) of the eigenvalues of the random covariance matrices AA* and A* A, whose 
spectral density is asymptotically given by the Marchenko-Pastur law [19| . The instability near E ^ arises 
from the fact that H has a macroscopically large kernel unless L/N — )■ 1/2. In the latter case the support of 
the Marchenko-Pastur law extends to zero and in fact the density diverges as E~'^/'^. We remark that a local 
version of the Marchenko-Pastur law was given in |'12: for the case when the limit of L/N differs from 0, 1/2 
and oo\ the "hard edge" case, L/N — !■ 1/2, in which the density near the lower spectral edge is singular, was 
treated in |T^. 

This example shows that the vanishing of 5- may lead to a very different behaviour of the spectral 
statistics. Although our technique is also applicable to random covariance matrices, for simplicity in this 
section we assume that 5- ^ c for some positive constant c. By Proposition A.3 this holds for random band 
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matrices, for full Wigner matrices (see Definition 3.1), and for their combinations; these examples are our 
main interest in this paper. 



Under the condition (5_ ^ c, the upper bound of (A. 3 1 yields 



CloeN 



(7.5) 



where 9 was defined in (3.2 1 and (5+ is the upper gap of the spectrum of 5' given in Definition A.l Notice 
that 9 vanishes near the spectral edge E = ±2 as 77 — !• 0. For the purpose of estimating F, this deterioration 
is mitigated if the upper gap S+ is non- vanishing. While full Wigner matrices satisfy (5+ ^ c, the lower bound 
on (5+ for band matrices is weaker; see Proposition A. 3 for a precise statement. 

We first give an estimate on rjx using the explicit bound (7.5). While not fully optimal, this estimate is 
sufficient for our purposes and in particular reproduces the correct behaviour when ^ c. 



Lemma 7.2. Suppose that ^ c (so that (7.51 holds). Then we have for any \x\ ^ 2 



In the regime 2 ^ |x| ^ fO we have the improved bound 

Af(V^ + ^+ + Af-i/5)3 
Proof. For any |a;| ^ 2 define rj'^ as the solution of the equation 



(7.6) 



(7.7) 



1 



f 



f 



(7.1 



y Mr] {k^ + ?y2/3 + ^^)2 Mt] {k^ + rf/^ + S+)^ 
This solution is unique since the left-hand side is decreasing in 77. An elementary but tedious analysis of 



(7.8) yields 



^ M(ac, + 5+ + M-i/5)7/2 • (7-9) 

(The calculation is based on the observation that if 77(0 + ry") ^ b for some a,b > and a ^ 0, then 
77 ^ 26(6t+J» + a)^^.) From (7.5), lmm(a; + irj) ^ C^k^ + 77 (see (4.4|) and the simple bound 9(x + irf) 
c{kx + if^^), we get for r/ ^ 77^ 



/ lmm(x + 77?) ~2 ^ ^ ^ ^ C(log7V)3M-1?. 

M77 



f 



From the definition 



The proof of (7.7 



2.f7) of S, we therefore get 77^ ^ 7;^, which proves (7.6). 

) is similar, but we use 9 = + rj and the stronger bound Imm ^ rj/^n + rj available 



in the regime |x| ^ 2. For 2 ^ |x| ^ 10, define 77^ to be the solution of the equation 



3"l 



(7.10) 



As for (7.9), a tedious calculation yields 



Vx ^ 



This concludes the proof. 

Next, we obtain an estimate on the extreme eigenvalues. 



□ 
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Theorem 7.3 (Extremal eigenvalues). Suppose that 6- ^ c (so that (7.51 holds) and that N^/^ ^ M < 
N . Then we have 

\\H\\ 2 + 0^{X), (7.11) 

where we introduced the control parameter 



X : = 



In particular, if S-^^ ^ c then 



/ N 



\H\\ < 2 + 0. 



^ N \ 1/7- 



-12 



(7.12) 



(7.13) 



7.131 yields the optimal error bound 0^{N~^^^) in the case of a full and flat Wigner matrix 



3.1 1. Under stronger assumptions on the law of the entries of H, Theorem 7.3 can be improved 



Note that ( 
(see Definition 
as follows. 

Theorem 7.4. Suppose that the matrix elements hij have a uniform subexponential decay, i.e. that there 
exist positive constants C and d such that 



Then ( |7.11[ ) holds with 



X := M-'/\ 



(7.14) 
(7.15) 



If in addition the law of each matrix entry is symmetric (i.e. hij and ~hij have the same law), then (7.11) 
holds with 

X := M-^/^ . (7.16) 



We remark that (7.15) can obtained via a relatively standard moment method argument combined with 
refined combinatorics. Obtaining the bound (7.16) is fairly involved; it makes use of the Chebyshev polyno- 
mial representation first used by Feldheim and Sodin 17 22 in this context for a special distribution of hij, 
and extended in pi to general symmetric entries. 



Proof of Theorem 17.31 We shall prove a lower bound on the smallest eigenvalue Ai of H; the largest 
eigenvalue Ajv may be estimated similarly from above. Fix a small 7 > and set 



i := M^^ 



MS/3 



We distinguish two regimes depending on the location of Ai, i.e. we decompose 

l(Ai s^-2-£) = 01+02, 

where 

01 l(-3 Ai -2-^), 02 l(Ai ^ -3). 
In the first regime we further decompose the probability space by estimating 



k=0 



>i,k , 0i,fc 1 ( -2 - £ - ^ Ai jc; -2 - ^ - ^ 



The upper bound fco is the smallest integer such that 2 
set 



fco + l 
N 



> 3; clearly ko ^ N. For any k ^ ko we 



Zk ■= Ek + iiik , Ek ■■= -2 - Kfc , Kk (.+ 



k 
N 



Vk 



N 



Clearly, rjk ^ Hk since M ^ N . On the support of 0i.fe we have |Ai — Ek\ ^ C/N ^ 77^, so that we get the 
lower bound 



0i,felmm7v(2:fc) = 



1 



Vk 



'''^l^ii^o.-Ekr + 4 " "^''"N iX,-Ekr+vl " Nr,k 



(7.17) 
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for some positive constant c. On the other hand, by (4.4 1, we have 



Therefore we get 



(?!)i,fc|lmmAr(zfc) - Inim(zfe)| ^ 



Cf]k_ 



(7.18) 



for some positive constant c'. Here in the second step we used that ??fc/y^Kfe ^ M '^{Nijk) ^■ 

Suppose for now that 6^ ^ c. Then by (7.6) we have the upper bound rjx ^ CM''^^^, uniformly for 
^ 10. Since T]k ^ CM^''~^ we find that G S with |Re2;fe| ^ 2. Hence (2.20) apphes for z = Zk and we 

get 



|lmm7v(zfe) — Imm(zfe)| -< 



1 



+ 



1 



MKk (Mrjkyy/Ji^ 



€ CM- 



1 



(7.19) 



Comparing this bound with (7.18) we conclude that (pi^k -< (i.e. the event {(f>i^k — 1} has very smaU 
probability). Summing over k yields -< 0. Note that in this proof the stronger bound ( 2.2G[ ) outside of the 
spectrum was essential; the general bound of order {Alrjk)^^ from (2.19) is not smaller than the right-hand 
side of ( [7l8| . 

The preceding proof of ^ assumed the existence of a spectral gap 5+ ^ c. The above argument 
easily carries over to the case without a gap of constant size, in which case we choose 



M8/3 



+ 



N 
M2 



N 



1/7- 



N 



Vk 



N 



The last term in rik guarantees that z^ G S, by (7.7). Then we may repeat the above proof to get ^ for 
the new function ipi. 

All that remains to complete the proof of (7.11) and (7.13) is the estimate (/)2 -< 0. Clearly 



is=:-3) ^ E|{j:A, s$-3}|. 

In part (2) of Lemma 7.2 in [14 it was shown, using the moment method, that the right-hand side is bounded 
by CA^^"^'"^'"^^ provided the matrix entries hij have subexponential decay, i.e. 



(x > 0) , 



for some constants a,/3 (recall the notation (2.5|). In this paper we only assume polynomial decay, (2.6|. 
However, the su bex ponential decay assumption of 14 was only used in the first truncation step. Equations 
(7.28)-(7.29) in 14 , where a new set of independent random variables hij was constructed with the properties 
that 



^ 1 - e" 



EC„- =0, E|C„| E|C. 



+ c" 



(7.20) 



for n = (log iV) (log log A^). Under the condition (2.6 1 the same truncation can be performed, but the 
estimates in (7.20) will be somewhat weaker; instead of the exponent n = (log A) (log log A^) we get n = 
DlogN for any fixed D > 0. The conclusion of the same proof is that, assuming only (2.6 1, we have 

: Xj < ~3}| ^ (7.21) 

for any positive number D and for any A^ ^ No{D). This guarantees that 02||-ff|| -< 0. Together with the 
estimate ^ 30i -< established above, this completes the proof of Theorem 7.3 □ 



Proof of Theorem 17.41 The estimate of \\H\\ with X ~ M^^^^ follows from the proof of part (2) of 
Lemma 7.2 in [14], by choosing k — M^^^^^^ with any small £ > in (7.32) of jl4j. This argument can be 
improved to X = M~^/^ by the remark after (7.18) in [1^. Finally, the bound with X = M~^^^ under the 
symmetry condition on the entries of H is proved in Theorem 3.4 of [2]. □ 
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Next, we establish an estimate on the normalized counting function wn defined in (7.1). As above, the 
exponents are not expected to be optimal, but the estimate is in general sharp if S+ ^ c. 

Theorem 7.5 (Eigenvalue counting function). Suppose that S- ^ c (so that (7.5) holds). Then 



sup \nNiE)~niE)\ = 0^{Y), 



where we introduced the control parameter 

Y :-- 



mI(5, +M-1/5 



7/2 



(7.22) 



(7.23) 



Proof. First we prove the bound (7.22) for any fixed E e [—10,10]. Define the dyadic energies Ek ■= 
-2 - 2^{S+ + M-i/5). By (fT^el) we have for all fc ^ 



max{r/B : E £ [Ek+i,Ek]} 



[2*^-(5+ + M-i/5)] 



7/2 ■ 



A similar bound holds for E',^ := -2 + 2''{S+ + M''^/^). For any E £ [-10,0], we express nN^E) - n{E) as 
a telescopic sum and use ( 7.4 ) to get 



IxxNiE) - n{E)\ < InAr(-lO) - n(-10)| +Y,\{^N{Ek+i) ~ nAr(Sfc)) - {n{Ek+i) ~ n{Ek)) 



k>0 



-< 



Af-i+4^(<5++Af-i/5)-7/2. 



(7.24) 



Here we used that n(— 10) = and nAr(— 10) ^ nAr(— 3) ^ by (7.21). In fact, (7.24) easily extends to any 
E < —10. By an analogous dyadic analysis near the upper spectral edge, we also get (7.21 ) for any E ^ 0. 
Since this holds for any 7 > 0, we thus proved 



\nN{E)-n{E)\ -< Y 



(7.25) 



for any fixed E S [-10,10]. 

To prove the statement uniformly in E, we define the classical location of the a-th eigenvalue 7^ through 



g{x) dx 



Applying (|7.25|) for the N energies E' = 71 , . . . , 7^? , we get 

a 
N 



N 



-< Y 



(7.26) 



(7.27) 



uniformly in a = 1, . . . , iV. Since nN{E) and n(E) are nondecreasing and Y ^ 1/A^, we find 

sup{nAr(£;) - n(£') : 7q_i < £; < 7a} ^ nN {'Ja) ~ n{'ya-i) (7a) - "-(7a) + ^ = 0^{Y) 

uniformly in a = 2, 3, ... . Below 71 we use ( 7.27[ ) to get 

sup (uNiE) ~ n{E)) 5$ njv(7i) = 0^{Y) . 

Finally, for any E > jn, we have nN{E) — n{E) = nN{E) — 1^0 deterministically. Thus we have proved 

snv{nN{E)^niE)) = 0^(r). 



A similar argument yields inf£;gR(nAr(£') — n{E)) — 0^{Y). This concludes the proof of Theorem 



7.5 



□ 
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Next, we derive rigidity bounds on the locations of the eigenvalues. Recall the definition of 7^ from 



(7.26). 



Theorem 7.6 (Eigenvalue locations). Suppose that S- ^ c (so that (7.5) holds) and that (7.11) and 
(7.22) hold with some positive control parameters X,Y ^ C . Define a :— niinja, + 1 — a} and let e > 
be arbitrary. Then 

and 



|Ao -7a| -< Y 



for a > M^NY , 



(7.28) 



\K-la\ < X + {M^Yf'^ for ai^M^NY. (7.29) 

Proof. To simplify notation, we assume that a ^ iV/2 so that a — a] the other eigenvalues are handled 
analogously. Without loss of generality we assume that Ajv/2 ^ 1- Indeed, the condition A7V/2 ^ 1 is 
equivalent to n(l) ^ 1/2, which holds with very high probability by Theorem 7.5 and the fact that nsc(l) > 
1/2. 

The key relation is 

n(7„) = nAr(A„) = n(A„) + 0^(y), (7.30) 



a 

N 



where in the last step we used Theorem 7.5 By definition of n[x) we have for —2 ^ a; ^ 1 that 

n{x) X (2 + x)3/2 X ^3/2^ _ n{xy^\ 

Hence for a ^ N/2 we have 



7a +2 



2/3 



N 



1/3 



(7.31) 



(7.32) 



Suppose first that a ^ ao := Ad^NY. Then n{-^a) ^ M^Y , so that the relation (7.30) implies 

|n(7„) -n(A„)| -< F s$ Af-^n(7„) , 



which yields n{^a) ^ By (7.31), we we therefore get that n^ja) ^ n'{Xa) as well. Since n' is 



nondecreasing, we get n'{x) >; n'{^a) ^ "•'(Aq) for any x between 7^ and Aq. Therefore, by the mean value 
theorem, we have 

'Ar\ 1/3 
a 



I , I ^ C|n(7a) ~ n(Aa)| 



"-'(7a) 



where in the last step we used (7.301 and (7.32). This proves (7.28) for a ^ NPNY. 
For the remaining indices, a < ao, we get from (7.30) the upper bound 

2 + A„ ^ 2 + A„„ = 2 + 7,„+0^(y2/3) ^ {M'Yf/\ 

where in the second step we used ( 7.28| ) and in the last step ( |7.32 1. In order to obtain a lower bound, we 
use Theorem |7.3| to get 

-(2 + A„) -(2 + Ai) -< X. 

Similar bounds hold for 7^ as well: 

2 + 7„ < 2 + 7„„ {M'Yf/\ 
Combining these bounds, we obtain 

|A„-7a| < X + {M'Yf'\ 
This concludes the proof. □ 

Finally, we state a trivial corollary of Theorem |7.6[ 

Corollary 7.7. Suppose that ^ c and that ( |7.11[ ) and (7.22) hold with some positive control parameters 
X,y < C. Then 

N 

J2\Xc.-lc.\^ -< NY{Y + X^). 

a=l 
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7.1. Local density of states: proof of Lemma 7.1 In this section we prove Lemma 7.1 Define tlic empirical 
eigenvalue distribution 



1 ^ 



so that we may write 



nN{E) - - |{« : A„ < E}\ 



We introduce the differences 



gNix)dx, mAr(z) = — TrG(z) 



£lAr(x) dx 



Qn - g, m 



A 



Following [7j, we use the Helffer-Sjostrand functional calculus. Introduce £ ■■= max|i?2 — Ei, ry} . Let x 
be a smooth cutoff function equal to 1 on £] and vanishing on [~2£, 2£Y, such that |x'(?/)l ^ C£^^. Let 
/ be a characteristic function of the interval [Ei,E2] smoothed on the scale rj: f{x) = 1 on [Ei + rj, E2 — rj], 
fix) = on [Ei,E2Y, \f'ix)\ < CJ]-\ and |/"(x)| ^ Crj-^. Note that the supports of /' and /" have 
measure 0{rj). 

Then we have the estimate (see Equation (B.13) in |7]) 



/(A) g'^{X)dX 



^ C 



dx dy{f{x)+yf'{x))x'{y)m'^{x + iy) 



C 



dx I dy f'{x)x{y)ylmm^{x + iy) 



C 



dx / dy f"{x)x{y)ylmm^{x + iy) 



Since x' vanishes away from [£, 2£] and / vanishes away from [Ei, i?2], we may apply (7.2) to get 

1 



\mM{x + iy) — m{x + iy)\ -< 



My 



(7.33) 



(7.34) 



uniformly for x £ [Ei, E2] and y ^ rj. Thus the first term on the right-hand side of (7.33) is bounded by 



C 
M£ 



2£ 



dx / dy\f{x) + yr{x)\ ^ 



M 



(7.35) 



In order to estimate the two remaining terms of (7.33), we estimate lmm'^{x + iy). If y 77 we may use 
(7.34). Consider therefore the case < y ^ rj. From Lemma 4.3 we find 



|Imm(x + iy)| C^/k^ + y ■ (7.36) 

By spectral decomposition oi H, it is easy to see that the function y 1— ;> yluYmN{x + iy) is monotone 
increasing. Thus we get, using (7.36), x + irj^ S, and (7.2), that 



y Im m^r (a; + iy) ^ •qluYrrif^ 



for y ^ 77 and x G [£'1, -£2]- Using m = tun — m and recalling (7.36), we therefore get 

|yImTO'^(x + iy)| -< rjy^ k.^; + rj + ^ , 



(7.37) 



(7.38) 



for y ^ rj and x e [Ei, i?2]. The second term of (7.33) is therefore bounded by 



(^ri\/'ix+V + J 'ix\f'{x)\J^ dyx(y) < V\/i^x + V+ ^ 
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In order to estimate the third term on the right-hand side of (7.33), we integrate by parts, first in x and 
then in y, to obtain the bound 



C 



/d./,.)?R«„.-(. + .5)|+c|/d./^"d„/' 



(a;)x'(y)yRe m^{x + iy) 



C 



Ax \ dyf{x)x{y)Rem^{x + iy) 



(7.39) 



The second term of ( |7.39 1 is similar to the first term on the right-hand side of (7.331, and is easily seen to 
be bounded by 1/M as in (7.35). 



In order to bound the first and third terms of (7.391, we estimate, for any y ^ rj 



m^(a; -I- iy) I ^ |m^(a; + i?]) | + / du (^\dumN{x + m)\ + \dum{x + iu)\^ 



(7.40) 



Moreover, using the monotonicity of y yliiimpf{x + iy) and the identity I Gyp ~ rj ^ IraGu , we find 
for any u ^rj that 



\dumNix + iu)\ = 



N 



TrG^ix + iu) 



^ —'S^\Gi.j(x + m]\'^ = — Im (.T + iw) ^ rjlmniNix + irf) . 



Similarly, we find from (2.7) that 



1 ~ Crj 



Thus (7.40) and (7.34) yield 



(7.41) 



where we also used that rj ^ M ^. Using (7.41) for y — t], we may now estimate the first term of (7.39) by 



What remains is the third term of (7.391, which can be estimated, using (7.34 1, by 



2£ 



1 



dx / dy|/'(a;)|— CM-\l + \logJj\) sC CArHogM . 
Jfy My 



Summarizing, we have proved that 

/(A) e^(A)dA 



1 ^ / ^ logAf 

^ T7 + ^iVi^x + 11 + 7]^ — < 77 -I- 



M 



M 



1 



(7.42) 



Since Imr7i7v(x -I- \rf) controls the local density on scale 77, we may estimate \nM{E) — n{E)\ using (7.37) 
according to 



M 



Thus we get 

njv(-Bi) - nA,(£;2) - J /(A) ^^^^(A) dA 

Similarly, since g has a bounded density, we find 

n(E,)~n{E2)- I f{X)g{X)dX 



i=l,2 



< C7j. 



Together with (7.42) and recalling r] ^ M , we therefore get (7.4). This concludes the proof of Lemma 



7.1 
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8. Bulk universality 



Local eigenvalue statistics are described by correlation functions on the scale Fix an integer n ^ 2 and 

an energy E G (—2, 2). Abbreviating x — (xi, X2, ■ ■ ■ Xn), we define the local correlation function 

f^"\E,^) --^p'^;;^(e+^^,E+^^,...,E+^^], (8.1) 
■/N y ^ ' gi^EY^'N \^ ^ Nq{E)' Ng{E)' ' N g{E) ) ' ^ ' 

where p"^^ is the rt-point correlation function of the N eigenvalues and q{E) is the density of the semicircle 
law defined in (2.7 1. Universality of the local eigenvalue statistics means that, for any fixed n, the limit as 
N ^ oo oi the local correlation function /^-^ only depends on the symmetry class of the matrix entries, and 

in] 

is otherwise independent of their distribution. In particular, the limit of coincides with that of a GOE 
or GUE matrix, which is explicitly known. In this paper, we consider local correlation functions averaged 
over a small energy interval of size ^ = N~'^ , 



/(;')(£;,x) ■■= ^ij^ ^ f^;'\E\^)dE'. (8.2) 

Universality is understood in the sense of the weak limit, as — > oo for fixed \E\ < 2, of f]^\E,x) in the 
variables x. 



The general approach developed in 11 12 14 to prove the universality of the local eigenvalue statistics 



in the bulk spectrum of a general Wigner-type matrix consists of three steps. 

(i) A rigidity estimate on the locations of the eigenvalues, in the sense of a quadratic mean. 

(ii) The spectral universality for matrices with a small Gaussian component, via local ergodicity of the 
Dyson Brownian motion (DBM). 

(iii) A perturbation argument that removes the small Gaussian component by comparing Green functions. 

In this paper we do not give the details of steps (ii) and (iii), since they have been concisely presented 
elsewhere, e.g. in |13| . Here we only summarize the results and the key arguments of steps (ii) and (iii) 
for the general class of matrices we consider. In this section we assume that H is either real symmetric or 
complex Hermitian. The former case means that the entries of H are real. The latter means, loosely, that 
its off-diagonal entries have a nontrivial imaginary part. More precisely, in the complex Hermitian case we 
shall replace the lower bound on the variances Sij from Definition 3.1 with the following, stronger, condition. 

Definition 8.1. We call the Hermitian matrix H a complex a- full Wigner matrix if for each i,j the 2x2 
covariance matrix 

E{Rc h,jf E{Reh,j){lmh, 



satisfies 



E(Re%)(Imft.ij) E(Im/iy)2 



a 

cr ^ — 

N 



as a symmetric matrix. Note that this condition implies that H is a-full, but the converse is not true. 
We consider a stochastic flow of Wigner-type matrices generated by the Ornstein-Uhlenbeck equation 

dHt = -^dBt- -Htdt 

with some given initial matrix Hq. Here B is an N x N matrix- valued standard Brownian motion with the 
same symmetry type as H. The resulting dynamics on the level of the eigenvalues is Dyson Brownian motion 
(DBM). It is well known that Ht has the same distribution as the matrix 

e-*/2i/o + (i_e-*)i/2[/, (8.3) 
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where U is an independent standard Gaussian Wigner matrix of the same symmetry class as H. In particular, 
Ht converges to U as t — >■ oo. The eigenvalue distribution converges to the Gaussian equilibrium measure, 
whose density is explicitly given by 



N 



MA) = ^e-/'^«(^)dA, nX) 5]^-l^log|A.-A,|; 

i—l i<j 

here (3 = 1 for the real symmetric case (GOE) and /3 = 2 for the complex Hermitian case (GUE). 
The matrix S'*^*-' of variances of Ht is given by 

SW = e-*5(°) + (l-e-*)ee*, 

where 5'*^°-' is the matrix of variances of Hq. It is easy to see that the gaps S±{t) of S'*^*'' satisfy 6±{t) ^ S±{0); 



therefore the corresponding parameters (2.11) satisfy Tt{z) ^ ro(z). Since all estimates behind our main 
theorems in Sections [2] and [7] improve if 6± increase, it is immediate that all results in these sections hold 
for Ht provided they hold for Hq- 

The key quantity to be controlled when establishing bulk universality is the mean quadratic distance of 
the eigenvalues from their classical locations, 

g maxE(*)^^(A,-7.)^ (8.4) 



where E^*) denotes the expectation with respect to the ensemble Ht- By Corollary 7.7 we have 

Q 5$ N^'YiY + X^) 



for any e > and N ^ No{e). Here we used that the estimate from Corollary |7.7| is uniform in t, by the 
remark in the previous paragraph. 

We modify the original DBM by adding a local relaxation term of the form ^ '^iiK — li)'^ to the original 
Hamiltonian Ji, which has the effect of artificially speeding up the relaxation of the dynamics. Here r ^ 1 
is a small parameter, the relaxation time of the modified dynamics. We choose r := N^'^'^^Q for some e > 0. 
As Theorem 4.1 of [l2] (see also Theorem 2.2 of [Ts]) shows, the local statistics of the eigenvalue gaps of Ht 
and GUE/GOE coincide if t > iV^r = N^+^^Q, i.e. if 

t ^ N^+'^^YiY + X^) . (8.5) 

The local statistics is averaged over N^~'^ consecutive eigenvalues or, alternatively, in the energy parameter 
E over an interval of length . 

To complete the programme (i)-(iii), we need to compare the local statistics of the original ensemble 



H and Ht, i.e. perform step (iii). We first recall the Green function comparison theorem from 14 for the 
case M X iV (generalized Wigner). The result states, roughly, that expectations of Green functions with 
spectral parameter z satisfying Imz ^ N~^~'^ are determined by the first four moments of the single-entry 
distributions. Therefore the local eigenvalue statistics on a very small scale, 77 = N^^^^, of two Wigner 
ensembles are indistinguishable if the first four moments of their matrix entries match. More precisely. 



for the local n-point correlation functions (8.1) to match, one needs to compare expectations of n-th order 
monomials of the form 



n 

k=l 



mNiEk + iij), (8.6) 



where the energies Ek are chosen in the bulk spectrum with Ek ~ Ek' = 0{l/N). (Recall that mM{z) = 
iTrG(z).) 

The proof uses a Lindeberg-type replacement strategy to change the distribution of each matrix entry 
hij one by one in a telescopic sum. The error resulting from each replacement is estimated using a fourth 



order resolvent expansion, where all resolvents G{z) = (H — z)~^ with z = E^ + ir] appearing in (8.6) are 
expanded with respect to the single matrix entry hij (and its conjugate hji = hij). If the first four moments 
of the two distributions match, then the terms of at most fourth order in this expansion remain unchanged 
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by each replacement. The error term is of order E|ft.ij|^ x A^^^/^, which is neghgible even after summing up 
all pairs of indices This estimate assumes that the resolvent entries in the expansion (and hence 

all factors mN[z) in (8.6)) are essentially bounded. 

The Green function comparison method therefore has two main ingredients. First, a high probability 
apriori estimate is needed on the resolvent entries at any spectral parameter z with imaginary part rj slightly 
below 1/iV: 



max|Gjj(£: + iry)| ■< N' 



2e 



(77 ^ N~^-') 



(8.7) 



for any small e > 0. Clearly, the same estimate also holds for mi^{E + irf). The bound (8.7 1 is typically 
obtained from the local semicircle law for the resolvent entries, ( 2.18| ). Although the local semicircle law is 
effective only for Iniz 3> l/A^, it still gives an almost optimal bound for a somewhat smaller rj by using the 
trivial estimate 



(7? ^ r;') (8.8) 
follows from a simple dyadic decomposition; see the proof 



sup max Im Gu {E + ijy") 

rj"^ri' * 



max\Gij{E + iT])\ < log TV 

with the choice of rf = N~-^^^. The proof of ( 
of Theorem 2.3 in Section 8 of 14 for details. 

The second ingredient is the construction of an initial ensemble Hq whose time evolution Ht for some 
t ^ 1 satisfying (8.5) is close to H; here closeness is measured by the matching of moments of the matrix 
entries between the ensembles H and Hf. We shall choose Hq, with variance matrix S^°\ so that the second 
moments of H and Ht match. 



S = 



tgio) + (1 



')ee* 



(8.9) 



and the third and fourth moments are close. They have to be so close that even after multiplication with at 
most five resolvent entries and summing up for all i,j indices, their difference is still small. (Five resolvent 
entries appear in the fourth order of the resolvent expansion of G.) Thus, given 



.7), we require that 



max|E/i?. -E(*)/i?.| s$ ]si-2-(2n+9)e (s = 3,4) 



(8.10) 



to ensure that the expectations of the n-fold product in (8.6) are close. This formulation holds for the real 
symmetric case; in the complex Hermitian case all moments of order s = 3, 4 involving the real and imaginary 
parts of hij have to be approximated. To simplify notation, we work with the real symmetric case in the 
sequel. 

The matching can be done in two steps. In the first we construct a matrix of variances ^(o) such that 



5.9) holds. This first step is possible if, given S associated with H, (8.9) can be satisfied for a doubly 



stochastic 5'-'^^ i.e. if H is an a-fuU Wigner matrix and 

a ^ Ct 



(8.11) 



with some large constant C. For the complex Hermitian case, the condition (8.11 ) is the same but H has to 
be complex a-fuU Wigner matrix (see Definition 8.1 ). 

In the second step of moment matching, we use Lemma 3.4 of |15 to construct an ensemble Hq with 
variances 5"^°^ , such that the entries of H and Ht satisfy 



EM 



= E^'^hl = s 



Ehf 



\Ehj 



E(*)/.t,| 



^ Ctsl 



This means that (8.10) holds if 



Ctsl 



< jY-2-(2n+9)e 



Suppose that H is 6-flat, i.e. that Sij ^ b/N. Then this condition holds provided 

Gtb^ iv-(2»+9)^ 



(8.12) 



The argument so far assumed that M ^ N {H is a generalized Wigner matrix), in which case Gij{E + irj') 
remains essentially bounded down to the scale rj' ^ 1/N. If M <C N, then (2.18) provides control only down 

(8.13) 



to scale r]' ^ l/M and (8.8) gives only the weaker bound 

\G,,{E + irj)\ -< 



1 
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for any rj ^ 1/Af, which replaces (8.7). Using this weaker bound, the condition (8.12) is replaced with 

Ctb^ -< {Mr]Y+^ , (8.14) 



N 



needed for n-fold products of the form (8.6 ) to be close. (For convenience, here we use the notation ^ B 
even for deterministic quantities to indicate that Am ^ N^Bm for any e > and N ^ A'o(e).) The bound 



.14 ) thus guarantees that, for any fixed n, the expectations of the n-fold products of the form ( |8.6| ) with 
respect to the ensembles H and Ht are close. Following the argument in the proof of Theorem 6.4 of [m], 
this means that for any smooth, compactly supported function O : K" — >■ M, the expectations of observables 



Or, {n{\,^ - E),NiX,^ -E),..., 7V(A,„ - E) 
are close, where the smeared out observable O,, on scale 77 is defined through 



(8.15) 



1 



(ttTV)" 



dcki • • • da„ 0(ai, . . . ,a„) 6*,, 



N 



j{x) ■■= Im ■ 



X — irj 



To conclude the result for observables with O instead of in (8.15), we need to estimate, for both 
ensembles, the difference 



E {0~0^)(^N{X,,^E),N{\,~E),...,NiX, 
Due to the smoothness of O, we can decompose O — O,, — Qi + Q2, where 



E) 



(8.16) 



and 



n n 



3=1 j=i ^ ' 

with an arbitratry parameter K ^ N /M. Here the constants depend on O. The contribution from Qi to 



5.16) can thus be estimated by 



where we used that the expected number of eigenvalues in the interval [E — K/N, E + K/N] is (K), since 



5.13) guarantees that the density is bounded on scales larger than 1/M. The contribution from Q2 to (8.16) 

Y Q2(...) -< CK-' 



is estimated by 



E 



In the last step we used (|8.13|) to estimate 

1 



El 



1 



N 
M 



Optimizing the choice of K and 77, (8.14) becomes 



Ctb^ -< 



.17) 



(8.18) 



(8.19) 



Summarizing the conditions (8.5), (8.11), and (8.191, we require that 

(n^ + l)(n+4) 



N'+^^YiY + X^) -< min<^a,6 



in order to have bulk universality. We have therefore proved the following result. 
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Theorem 8.2. Suppose that H is N/M-flat and a-full (in the real symmetric case) or complex a-full (in the 
complex Hermitian case). Suppose moreover that ( |7.11[ ) and ( 7.22 ) hold with some positive control parameters 
X,Y ^ C . Fix an arbitrary positive parameter £ > 0. Then the local n-point correlation functions of H, 
averaged over the energy parameter in an interval of size N~'^ around \E\ < 2 (see ( |8.2[ ) ), coincide with 
those of GOE or GUE provided that 



X ) ^ min< a 



(n^ + l)(n+4)+2 



N 



(8.20) 



In particular, if N^/'^ M iV then (|7.11|) and (|7.22|) hold with X and Y defined in (|7.12|) and (|7.23|). 



We conclude with a few examples illustrating Theorem |8.2| 

Corollary 8.3. Fix an integer n ^ 2. There exists a positive number p{n) ^ cn^^ with the following 
property. Suppose that H satisfies any of the following conditions for some sufficiently small ^ > 0. 

(t) cN-^-i sC s,j sC CA^-i+p(")-«. 
(ii) cN-'i+^ s,j 

(Hi) H is a on e- di mensional band matrix with band width W with a mean-field component of size v (see 
Definition\33(j such that W ^ Afi-p(")+? and v > N^^'-^W'^^ . 

Then there exists an s > (depending on ^ and n) such that the local n-point correlation functions of H, 
averaged over the energy parameter in an interval of size N^'^ around \E\ < 2, coincide with those of GOE 
or GUE (depending on the symmetry class of H). 

We remark that the conditions for the upper bound on Sij in parts (i) and (iii) are similar. But the band 
structure in (iii) allows one to choose a much smaller mean- field component than in (i). 



Pro of. In Case (i), we have a = cN ^ and b — N/M in Definition 3.1 hence 5± ^ cN ^ by Proposition 
Therefore Y = M-^N~'^^''^ and X = N'^M-^'^ from (ItII and (|7.23|), so that 



A.3 



.20) reads 



AM 



N 

M\M 



iV4 



M16/3 



M\ (n'' + l)(n+4)+2 



By Theorem 8.2 bulk universality therefore holds provided that M ^ N'^-pM+i with any sufficiently small 
positive ^ > (and e chosen appropriately, depending on ^ and n). The function p{n) can be easily computed. 

We remark th at if we additionally assume that hij has a symmetric law with subexponential decay ( 7.14 1, 
then by Theorem 7.4 we can use the improved control parameter X — M^'^/'^ . This yields a better threshold 
p{n). For example, for n = 2 we obtain p{n) = 

In Case (ii) we take M ^ N, i.e. 6 = c and S+ ^ a = A^-i/8+?^ Then with the choice ( |7.12[ ) and ( |7.23[ ) 
we have Y < CN-^S^'^^^, X ^ CiV-^/s + CN-^{S+ + A^-i/^)-i^ so that 



5.20 ) reads 



6-'^'[N-'d-'/' + N-^^^ + N-^{S+ + N-'^')-'^) « a, 

which holds since 5+ ^ a ^ N^^^^. 

Finally, in Case (iii) we have ly x M, 6 = N/M, a = v, 6+ ^ cv + c{M/Nf, and 5^ ^ c. Since 
M ^ 7V22/23 Yvme 5+ > cAf-^/^, Thus, with the choice (|7.12| and \7.2i\, we have 



Y 



1 



M5 



7/2 



^ c 



MS 



and (8.20) reads 



M8 VM8 



N 



52 



M56 



X c c 



<^ min< ly 



M8/3 



C 



M\ (" +l)("+4)+2 

iV 



This leads to the conditions 



N 



15 



M16' 



M > iVi-P(") 



with some positive p{n), which concludes the proof. 



(8.21) 

□ 
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A. Behaviour of F and F 



In this section we give basic bounds on the parameters T and T. As it turns out, their behaviour is intimately 
Hnked with the spectrum of S, more precisely with its spectral gaps. Recall that the spectrum of S lies in 
[— 1, 1], with 1 being a simple eigenvalue. 

Definition A.l. Let S- be the distance from —1 to the spectrum of S, and (5+ the distance from 1 to the 
spectrum of S restricted to e-*- . In other words, 6± are the largest numbers satisfying 

S ^ -1 + S-, < 1-6+. 

The following proposition gives explicit bounds on F and F depending on the spectral gaps 5± . We recall 
the notations z — E + irj, k := \ \E\ — 2| and the definition of 9 from (3.2). 

Proposition A. 2. There is a universal constant C such that the following holds uniformly in the domain 
{z — E + irj : \E\ ^ 10, M^^ ^ ?7 ^ lO}, and in particular in any spectral domain D. 



(i) We have the estimate 



C^K + ri 1 - max± 



ClogA^ 
minj?] + E^, 6} 



(ii) In the presence of a gap 5- we may improve the upper bound to 

ClogN 



F(z) < 



min{5_ +ri + E'^,e}' 



(Hi) ForV, we have 



C^^ < F(z) ^ 



ClogiV 
min{(5_ + rj + E'^ ,5+ + 6}' 



(iv) Finally, we have the alternative upper bound 

f(z) s=: F(z) sc 



c 



N 



min± 5± + -^k + 77 V M 



(A.l) 



(A.2) 



(A.3) 



(A.4) 



We remark that in our applications, the alternative bound (A.4) yields a slight improvement in certain 
regimes (in particular, it slightly improves some exponents in Corollary 8.3), but we shall not pursue this 
direction, and instead only use the bounds (A.1)-(A.3). 



Proof. The first bound of (A.l) follows from (1 — rn^S) = (1 — m^) combined with (4.3). In order 

1 



to prove the second bound of ( A.l ), we write 



1 



1 



1 - m?S 2 1 - 1+™^ 
2 



and observe that 



Therefore 



l + m^S* 



^ max 



l±m2 



(A.5) 



1 



1 - m'^S 



ji=0 



l + rn^S 



ClogA^ 
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^ 1 and 



where in the last step we chose no = ^oijN j^^^^ 

enough Cq. Here we used that \\S\\ 
(4.2) to estimate the summands in the first sum. This concludes the proof of the second bound of (A.l|. 
The third bound of ( A.l I follows from the elementary estimates 

1 — 



< l-c{7^ + E')., 





1 -c(^ 


2 





Im m + r/ 



(A.6) 



for some universal constant c > 0, where in the last step we used Lemma |4. 3 



The estimate ( A. 2 ) follows similarly. Due to the gap 5_ in the spectrum of S, we may replace the estimate 

< max-^ 1 - 5_ - ?7 - i?^ , 



(A. 5 1 with 



l + m^S- 
2 



1 + 
2 



(A.7) 



Hence (A.2) follows using (A.6). 



The lower bound of (A. 3) was proved in (4.5). The upper bound is proved similarly to (A.2), except that 



(A.7 1 is replaced with 

1 + m^S 



^ max< 1 — (5_ — r/ — , min< 1 — S 



1 + TO^ 



Finally, in order to prove (|A.4|, we estimate 
1 



1 — rn?S 



^ 1 



S 



1 



1 — rn?S 



< l + C||5||p^,o 



1 — rn?-S 



>£2 



The factors on the right-hand side may be estimated as 

1/2 



= max 



(E4) < mil 



and 



rrflS 



c 



xe[-i+s^,i-s+\ |1 — x'm?{z)\ min± 5± + -^k + rj ' 



where in the last step we used (4.3). This concludes the proof of (A.4). 



□ 



The following proposition gives the behaviour of the spectral gaps 5± for the example matrices from 
Section [3l 

Proposition A. 3 (Spectrum of S for example matrices). (i) If H is ana-full Wigner matrix then 
S- ^ a and S+ ^ a. 

(ii) If H is a band matrix there is a positive constant c, depending on the dimension d and the profile 
function f, such that S- ^ c and 5^ ^ c{W/L)'^. 

(Hi) If H = Vl — vHb + \/T'H\Y, where Hb is a hand matrix, Hw is an a-full Wigner matrix independent 
of Hb, and v G [0, 1] (see Definition 3.3), then there is a constant c depending only on the dimension 
d and the profile function f of Hb, such that ^ c and (5+ ^ c{W/L)'^ + va. 

Proof. For the case where H is an a-full Wigner matrix, the claim easily follows by splitting 

S — (S* — aee*) + aee* . 

By assumption, the first term is (1 — a) times a doubly stochastic matrix. Hence its spectrum lies in 
[— 1 + a, 1 — a]. The claims on 5± now follow easily. 

The claims about band matrices were proved in Lemma A.l of 14 and Equation (5.16) of [s], respectively. 
Finally, (iii) easily follows from (i) and (ii). □ 
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B. Proof of Theorems 4.6 and 4.7 



Theorems |4.6| and |4. 7| are essentially simple special cases of the much more involved, and general, fluctuation 
averaging estimate from Nevertheless, here we give the details of the proofs because (a) they do not 
strictly follow from the formulation of the result in |6], and (b) their proof is much easier than that of [6], so 
that the reader only interested in the applications of fluctuation averaging to the local semicircle law need 
not read the lengthy proof of [6]. We start with a simple lemma which summarizes the key properties of -< 
when combined with expectation. 

Lemma B.l. Suppose that the deterministic control parameter ^ satisfies 5* ^ N^^ , and that for all p there 
is a constant Cp such that the nonnegative random variable X satisfies 'KX'p ^ 7V<^p . Suppose moreover that 
that X . Then for any fixed n g N we have 



EX" -< 



(B.l) 



(Note that this estimate involves deterministic quantities only, i.e. it means thatKX"' ^ N'^'^"' for any £ > 
if N ^ N(){n,e).) Moreover, we have 

-< Q,X" -< (B.2) 

uniformly in i. If X = X(u) and ^' = ^(u) depend on some parameter u and the above assumptions are 
uniform in u, then so are the conclusions. 

Proof of Lemma IB.1I It is enough to consider the case n = 1; the case of larger n follows immediately 
from the case n — 1, using the basic properties of -< from Lemma |4.4[ 
For the first claim, pick e > 0. Then 

EX = EX1{X ^ N"^) +EX1{X > N'^) < N"^ + VEX^y^F{X > N^^) < iV^^- + ArC2/2-D/2 ^ 

for arbitrary D > 0. The first claim therefore follows by choosing D large enough. 

The second claim follows from Chebyshev's inequality, using a high-moment estimate combined with 
Jensen's inequality for partial expectation. We omit the details, which are similar to those of the first 
claim. □ 



We shall apply Lemma B.l to resolvent entries of G. In order to verify its assumptions, we record the 
following bounds. 



satisfying (4 



Lemma B.2. Suppose that A -< ^ and Aq -< for some deterministic control parameters 5* and 'i'o both 
Fix p G N. Then for any i ^ j and T C {1, . 



(T) 



I fT") I 

Moreover, we have the rough hounds \Gi^ \ ^ M and 

1 



E 



G). 



(T) 



G 



, A^} satisfying |T| ^ p and i,j^T we have 
= 0^(1). (B.3) 



for any £ > and N ^ Nq^u, e). 



(B.4) 



and 



Proof. The bounds (B.3) follow easily by a repeated application of (4.6), the assumption A -< M 
the lower bound in (4.2). The deterministic bound iGj-pl ^ M follows immediately from rj ^ M^^ by 



definition of a spectral domain. 

In order to prove (B.4 1, we use Schur's complement formula (5.6 1 applied to l/G^-p, where the expectation 
is estimated using ( |2.6[ ) and | G ' | 



< M. (Recall ^^.) This gives 
1 



E 



G 



(T) 



for all p e N. Since 1/g|P -< 1, therefore follows from (|RT) 



□ 
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Proof of Theorem 14.71 First we claim that, for any fixed p e N, we have 



Qk 



G' 



(T) 
kk 



(B.5) 



uniformly for T C {1, . . . ,N}, |T| ^ p, and k ^T. To simplify notation, for the proof we set T = 0; the proof 
for nonempty T is the same. From Schur's complement formula (5.6) we get \Qk{Gkk)^'^\ ^ \hkk\ + \Zk\- 
The first term is estimated by \hkk\ < M^^l"^ ^ The second term is estimated exactly as in (5.13) and 
(l5l4l): 

\ 1/2 



\Zk\ < 



where in the last step we used that |Giy^| < \l/o as follows from (B.3), and the bound l/IG/cfcl < 1 (recall 
that ^ M-"). This concludes the proof of ( |R5| ). 

Abbreviate := Qk{Gkk)~^ ■ We shall estimate J2k ^ik^k in probability by estimating its p-th moment 
by ^g^, from which the claim will easily follow using Chebyshev's inequality. Before embarking on the 
estimate for arbitrary p, we illustrate its idea by estimating the variance 




yk 



E 



k ki^l 



(B.6) 



kl 



Using Lemma B.l and the bounds (4.9) on tik-, we find that the first term on the right-hand side of (B.6) 
is ©^(Af^^^g) — O^(^'l^), where we used the estimate (4.8). Let us therefore focus on the second term of 
(B.6). Using the fact that fc 7^ Z, we apply (4.6) to Xk and Xi to get 

GkiGik 



EXkX 



EQk 



1 

Gkk 



Q 



1 

Gil 



EQk 



1 



G 



(i) 



kk(^kk^ll 



Q 



1 



GikGki 



G 



(fe) 



(B.7) 



^kk ^kk^kk^U ' ^^11 

We multiply out the parentheses on the right-hand side. The crucial observation is that if the random 
variable Y is independent of i (see Definition 4.2) then EQi{X)Y — EQi{XY) ~ 0. Hence out of the four 
terms obtained from the right-hand side of (B.7), the only nonvanishing one is 

-< *^ 



EQfc 



GkiGik 



kk(^kk^ll 



Qi 



GikGki 



llLril Ukk 



Together with (4.9), this concludes the proof of Ej^^, tii:^^ | -< ^f'^. 

After this pedagogical interlude we move on to the full proof. Fix some even integer p and write 



E 



-[,... 



^p/2-|-l 



■Xi 



Next, we regroup the terms in the sum over k .— [ki , . . . , kp) according to the partition of {1, . . . , p} generated 
by the indices k. To that end, let *Pp denote the set of partitions of {1, ... and P(k) the element of 
defined by the equivalence relation r s if and only if kr = kg. In short, we reorganize the summation 
according to coincidences among the indices k. Then we write 



E 



tikXk — ^ ^ tik-^ 

k PeVp k 



. Uk, l{V{-k) = P)F(k) , V{k) := EXk, ■ ■ ■ Xk.^X 



Kp/2+1 



' • Xk„ 



(B.8) 

Fix k and set P := 'P(k) to be partition induced by the coincidences in k. For any r G {1, . . . we denote 
by [r] the block of r in P. Let L = L{P) := {r : [r] = {r}} C {1, . . . ,p} be the set of "lone" labels. We 
denote by k^, ■■— {kr)reL the summation indices associated with lone labels. 

The resolvent entry Gkk depends strongly on the randomness in the fc-column of H, but only weakly 
on the randomness in the other columns. We conclude that if r is a lone label then all factors Xk with 
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s ^ r in y(k) depend weakly on the randomness in the k-ih. column of H . Thus, the idea is to make all 
reso lvent entries inside the expectation of l^(k) as independent of the indices k/, as possible (see Definition 



4.2), using the identity (4.6). To that end, we say that a resolvent entry Gxy with x,y is maximally 



expanded if k^ C T U {x,?/}. The motivation behind this definition is that using (4.6 1 we cannot add upper 
indices from the set k/, to a maximally expanded resolvent entry. We shall apply (4.6) to all resolvent entries 
in V^(k). In this manner we generate a sum of monomials consisting of off-diagonal resolvent entries and 



inverses of diagonal resolvent entries. We can now repeatedly apply (4.6) to each factor until either they are 



all maximally expanded or a sufficiently large number of off-diagonal resolvent entries has been generated. 
The cap on the number of off-diagonal entries is introduced to ensure that this procedure terminates after a 
finite number of steps. 

In order to define the precise algorithm, let A denote the set of monomials in the off-diagonal entries 



ciy', with T C k^, X ^y, and x,y £ k \ T, as well as the inverse diagonal entries l/Gx^jl, with T C k^ and 
X G k \ T. Starting from V^(k), the algorithm will recursively generate sums of monomials in A. Let d[A) 
denote the number of off-diagonal entries va A ^ A. For A e .4 we shall define z«o(A), Wi{A) e A satisfying 



A = wa{A)+wi[A), d{wo{A)) = d{A) , d{wi{A)) ^ max{2, + l} 



(B.9) 



The idea behind this splitting is to use (4.6) on one entry of A; the first term on the right-hand side of (4.6) 



gives rise to Wf){A) and the second to ■Wi{A). The precise definition of the algorithm applied to A € ^ is as 
follows. 



(1) If all factors of A are maximally expanded or d{A) ^ p + 1 then stop the expansion of A. In other 
words, the algorithm cannot be applied to A in the future. 

(2) Otherwise choose some (arbitrary) factor of A that is not maximally expanded. If this entry is off- 



diagonal, G^xy , write 



r^(T)^(T) 



for the smallest u G k/, \ (T U If the chosen entry is diagonal, \/Gxx ■> write 



^xx 



^XX ^xx ^xx ^uu 



(B.IO) 



(B.ll) 



for the smallest m e k^ \ (T U {x}). Then the splitting A = Wo{A) + Wi{A) is defined by the splitting 
induced by (B.lOl or (B.ll), in the sense that we replace the factor Gxy or l/GxJ in the monomial A 
by the right-hand sides of ( B.10[ ) or ( |B.11[ ). 

(This algorithm contains some arbitrariness in the choice of the factor of A to be expanded. It may be 
removed for instance by first fixing some ordering of all resolvent entrie s G\^ . Then in (2) wc choose the 
first factor of A that is not maximally expanded.) Note that (B.IO) and (B.ll) follow from ( |4.6| . It is clear 
that (B.9) holds with the algorithm just defined. 

We now apply this algorithm recursively to each entry A^ :— 1/Gk,.k,. in the definition of V^(k). More 
precisely, we start with A^ and define AJj ■■— wq{A^) and A\ wi{A^). In the second step of the algorithm 
we define four monomials 



Ala ■■= MAI). wo{Al), AJq wi(AS), A^^ wi{Al) , 

and so on, at each iteration performing the steps (1) and (2) on each new monomial independently of the 
others. Note that the lower indices are binary sequences that describe the recursive application of the 
operations Wq and Wi. In this manner we generate a binary tree whose vertices are given by finite binary 
strings a. The associated monomials satisfy A^^^ := Wi{A^^) for i = 0, 1, where ai denotes the binary string 
obtained by appending i to the right end of a. See Figure [B.l| for an illustration of the tree. 

We stop the recursion of a tree vertex whenever the associated monomial satisfies the stopping rule of 
step (1). In other words, the set of leaves of the tree is the set of binary strings a such that either all factors 
of are maximally expanded or d{A^„) ^ p+1. We claim that the resulting binary tree is finite, i.e. that 
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Figure B.l. The binary tree generated by applying the algorithm (l)-(2) to a monomial . Each vertex of the 
tree is indexed by a binary string a, and encodes a monomial A^.. An arrow towards the left represents the action 
of Wo and an arrow towards the right the action of wi. The monomial Ali satisfies the assumptions of step (1), and 
hence its expansion is stopped, so that the tree vertex 11 has no children. 



the algorithm always reaches step (1) after a finite number of iterations. Indeed, by the stopping rule in (1), 
we have d{A'^) ^ p + 1 for any vertex a of the tree. Since each application of wi increases d{-) by at least 
one, and in the first step (i.e. when applied to A^) by two, we conclude that the number of ones in any a 
is at most p. Since each application of wi increases the number of resolvent entries by at most four, and 
the application of wq does not change this number, we find that the number of resolvent entries in A'^ is 
bounded by 4p + 1. Hence the maximal number of upper indices in AJJ. for any tree vertex a is (4p + l)p. 
Since each application of wq increases the total number of upper indices by one, we find that a contains at 
most {4p + l)p zeros. We conclude that the maximal length of the string a (i.e. the depth of the tree) is at 
most (4p + l)p + p = Ap^ + 2p. A string a encoding a tree vertex contains at most p ones. Denoting by k 
the number of ones in a string encoding a leaf of the tree, we find that the number of leaves is bounded by 
ELo (^^ fc"^^) ^ (Cp^T- Therefore, denoting by Cr the set of leaves of the binary tree generated from A^, 
we have \Cr\ ^ {Cp^y. 

By definition of the tree and wo and wi, we have the decomposition 



(B.12) 



aeCr 



Moreover, each monomial for a ^ Cr either consists entirely of maximally expanded resolvent entries or 
satisfies d{A^^) = p + 1. (This is an immediate consequence of the stopping rule in (1)). 
Next, we observe that for any string <j we have 



(B.13) 



where h{a) is the number ones in the string a. Indeed, if h{a) = then this follows from (B.5|; if h{ij) ^ 1 



this follows from the last statement in (B.9) and (B.3) 



Using ( |B.8 1 and (B.12 1 we have the representation 



(B.14) 



We now claim that any nonzero term on the right-hand side of ( |B.14[ ) satisfies 



(B.15) 



Proof of (B.15|. Before embarking on the proof, we explain its idea. By (B.13), the naive size of the 



left-hand side of (B.15) is ^'JJ. The key observation is that each lone label s £ L yields one extra factor "ifo 



to the estimate. This is because the expectation in (B.14) would vanish if all other factors (Q^^A^ ), r ^ s, 
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were independent of ks- The expansion of the binary tree makes this dependence exphcit by exhibiting 
kg as a lower index. But this requires performing an operation wi with the choice u — kg in (B.IO) or 
(B.ll). However, wi increases the number of off-diagonal element by at least one. In other words, every 



index associated with a lone label must have a "partner" index in a different resolvent entry which arose 
by application of wi. Such a partner index may only be obtained through the creation of at least one 
off-diagonal resolvent entry. The actual proof below shows that this effect applies cumulatively for all lone 
labels. 

In order to prove (B.15), we consider two cases. Consider first the case where for some r = 1, . . . ,p the 
monomial AJJ.^ on the left-hand side of ( |B.15[ ) is not maximally expanded. Then ) = p + 1, so that 



(B.3) yields -< Therefore the observation that -< for all s r, together with (B.2) implies 

that the left-hand side of ^BJE^ is 0^(4-2?). Since \L\ p, 'pl5| ) follows. 

Consider now the case where on the left-hand side of ( B.15 ) is maximally expanded for all r = 1 



The key observation is the following claim about the left-hand side of (B.15) with a nonzero expectation 



(*) For each s £ L there exists r ~ t{s) E {1, 
entry with lower index kg. 



,p}\{s} such that the monomial contains a resolvent 



In other words, after expansion, the lone label s has a "partner" label r = t(s), such that the index ks 
appears also in the expansion of (note that there may be several such partner labels r). To prove (*), 
suppose by contradiction that there exists an s G L such that for all r g {1, . . . ,p} \ {s} the lower index ks 
does not appear in the monomial A^ . To simplify notation, we assume that s — I. Then, for all r = 2, . . . ,p, 
since A^^ is maximally expanded, we find that A'^ is independent of fci (see Definil 
have 



- 0, 



where in the last step we used that KQ.i{X)Y = EQi{XY) = if F is independent of i. This concludes the 
proof of (*). 

For r G {1, . . . ,p} we define i{r) ■■— J2seL -'-('''(*) — the number of times that the label r was chosen 
as a partner to some lone label s. We now claim that 



a: 



(B.16) 



To prove (B.16), fix r € {1, • . . ,p}- By definition, for each s G T^^({r}) the index kg appears as a lower 
index in the monomial . Since s G L is by definition a lone label and s r, we know that ks does not 
appear as an index in A^' . By definition of the monomials associated with the tree vertex cr^, it follows that 
b{(Jr), the number of ones in ct^, is at least T~^({r}) = £{r) since each application of wi adds precisely one 



new (lower) index. Note that in this step it is crucial that s G T~^({r}) was a lone label. Recalling (B.13I, 
we therefore get (B.16). 

Using ( |B.16 ) and Lemma B.l wc find 



This concludes the proof of (B.15) 



Summing over the binary trees in (B.14) and using Lemma |B.1[ we get from (B.15 1 

v{k) = o^(*ri^i). 



□ 



(B.17) 



We now return to the sum (B.8). We perform the summation by first fixing P G CPp, with associated lone 
labels L = L{P). We find 

^l(P(k)=P)t,fe,-..t,fc^ ^ (M-lf-l^l ^ (M-l/2)f-l^l ; 



in the first step we used (4.9) and the fact that the summation is performed over \P\ free indices, the 



remaining p — \P\ being estimated by M ^; in the second step we used that each block of P that is not 
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contained in L consists of at least two labels, so that p — |P| ^ (p — |i|)/2. From (B.8) and (B.17I we get 

p 



E 



k PeVp 



where in the last step we used the lower bound from (4.8) and estimated the summation over with a 
constant Cp (which is bounded by (Cp^)^). Summarizing, we have proved that 



E 



(B.18) 



for any p G 2N. 

We conclude the proof of Theorem |4.7| with a simple application of Chebyshev's inequality. Fix e > 
and D > 0. Using (B.18) and Chebyshev's inequality wc find 



for large enough N ^ No{e,p). Choosing p^ e ^{1 + D) concludes the proof of Theorem 



4.7 



□ 



Proof of Theorem 14.61 The first estimate of (4.11) follows from Theorem 4.7 and the simple bound 
Ao ^ A ^ vlf. The second estimate of (4.11 ) may be proved by following the proof of Theorem 4.7 verbatim; 
the only modification is the bound 



which replaces (B.5). Here we again use the same upper bound = for A and Aq. 

In order to prove the first estimate of (4.12), we write Schur's complement formula (5.6) using (2.8) as 

= ~ + ' H h^kC'^kihu - TO . 
'^11 III' \ , , / 



(B.19) 



Since \hu\ -< M^^/^ < * and \l/G^i - l/ m\ ^ ^, we find that the term in parentheses is stochastically 
bounded by Therefore we get, inverting (B.19) and expanding the right-hand side, that 



Taking the partial expectation Pi yields 



{i) 



{i) 



-hu + KkG^^'hu - m + 0^(*2) . 



k,l 



where in the second step we used (4.6|, (2.2), and (B.3|. Therefore we get, using (4.11) and QiGu = 
Qi{Gu - m) = QiVi, 



i,k 



i,k 



where in the last step we used that the matrices T and S commute by assumption. Introducing the vector 
w = {wa)a=i we therefore have the equation 

w = m^S-w + ^ (B.20) 

where the error term is in the sense of the £°°-norm (uniform in the components of the vector w). Inverting 
the matrix 1 — rri^S and recalling the definition (2.10) yields the first estimate of (4.12). 
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The proof of the second estimate of (4.121 is similar, except that we have to treat the subspace e-*- 
separately. We write 



N 

and apply the above argument to each term separately. Notice that here we used that the weights exactly 
sum up to one; see (4.9). This yields 



Ytaiivi~[v]) = ™^ tgj 



SikVk - m 



i k i,k 



where we used ( |2.3[ ) in the second step. Note that the error term on the right-hand side is perpendicular to 
e when regarded as a vector indexed by a, since all other terms in the equation are. Hence we may invert 
the matrix (1 — m^S) on the subspace e-'-, as above, to get the second estimate of (4.12). □ 



We conclude this section with an alternative proof of Theorem |4.7| While the underlying argument 
remains similar, the following proof makes use of an additional decomposition of the space of random 



variables, which avoids the use of the stopping rule from Step (1) in the above proof of Theorem 4.7 
This decomposition may be regarded as an abstract reformulation of the stopping rule. 



Alternative proof of Theorem 14.71 As before, we set Xk ■— Qk{Gkk)^^ ■ For simphcity of presen- 
tation, we set tik — N^^. The decomposition is defined using the operations Pi and Qi, introduced in 
Definition |4.2| It is immediate that Pi and Qi are projections, that Pi + Qi = 1, and that all of these 
projections commute with each other. For a set Ac {1, . • . ,N} we use the notations Pa ■— YiieA ^^"^ 
Qa-=Y{,^aQ^- _ _ _ 

Let p be even and introduce the shorthand Xk^ ■= Xk^ for s ^ p/2 and Xk^ ■— Xk^ for s > p/2. Then 
we get 



E 



k ki,...,kp s=l fci,...,fep s=l \r=l / 



Introducing the notations k = {ki, . . . , kp) and [k] — {fci, . . . , kp}, we therefore get by multiplying out the 
parentheses 



E 



(B.21) 



k Ai,...,ApC[k] s=l 



Next, by definition of Xk^, we have that Xk^ — Qk,Xk,, which implies that PA^Xk^ — ii kg ^ Ag. 
Hence may restrict the summation to Ag satisfying 

kg G Ag (B.22) 

for all s. Moreover, we claim that the right-hand side of ( |B.21 1 vanishes unless 

fc, G [jAg (B.23) 

q^s 

for all s. Indeed, suppose that kg E Clq^g fo'" some s, say s = 1. In this case, for each s ~ 2, . . . ,p, the 
factor Pa'^Q A,Xk^ is independent of fci (see Definition 4.2). Thus we get 



IEn(PAsQA,^fcJ = ^PAlQA^Qk^Xk,)\{{PA'^QAXu:) 

s=l s=2 

= ¥.QkMPAlQA,Xk,)\l{PA^fiA,Xk, 



- 0, 



where in the last step we used that EQi(A) = for any i and random variable X. 



48 



We conclude that the summation on the right-hand side of (B.21) is restricted to indices satisfying (B.22) 



and (B.23I. Under these two conditions we have 



p 

E 

s=l 



1^.1 > 2|[k] 



(B.24) 



since each index kg must belong to at least two different sets Agi to As (by (B.22)) as well as to some Aq 
with q^s (by (|B.23P). 



Next, we claim that for fc e A we have 



\QAXk\ -< 



(B.25) 



(Note that if we were doing the case Xk = QkGkk instead of Xk = Qk{Gkk)~^ , then ( B.25[ ) would have to 
be weakened to IQA-'^fcl -< in accordance with (4.11). Indeed, in that case and for A — {k}, we only 

have the bound \QkGkk\ ^ ^ and not \QkGkk\ ^ ^o-) 



Before proving (B.25), we show it may be used to complete the proof. Using (B.21), (B.25), and Lemma 
IbH we find 



E 



1 

N 

k 



'Xk 



u=l k 



p 

^pE' 



where in the first step we estimated the summation over the sets Ai^ . . . , Ap by a combinatorial factor Cp 
depending on p, in the forth step we used the elementary inequality a"6'" ^ (a + 6)"+™ for positive a, 6, and 
in the last step we used (4.8) and the bound M ^ N. Thus we have proved (B.18), from which the claim 



follows exactly as in the first proof of Theorem |4.7| 



What remains is the proof of (B.25 



exactly as in the first proof of Theorem 
and ^ = {1, 2, . . . , t} with t ^ 2. It suffices to prove that 



The case \A\ = 1 (corresponding to A ~ {k}) follows from (B.5), 



4.7 To simplify notation, for the case \A\ ^ 2 we assume that fc = 1 



Qf-Q2 



1 

GTi 



(B.26) 



We start by writing, using (4.6) 



Q2 



1 

Gil 



Q2 



'^11 



(2) 



Q2 



G12G21 



Q2 



G12G21 



f 2) I I 

where the first term vanishes since G\{ is independent of 2 (see Definition 4.2). We now consider 

G12G21 



Q3O2 



1 

'G^i 



Q2QS 



G/^(2) 
ll'^U '^22 



and apply (4.6) with fc = 3 to each resolvent entry on the right-hand side, and multiply everything out. The 



result is a sum of fractions of entries of G, whereby all entries in the numerator are diagonal and all entries 
in the denominator are diagonal. The leading order term vanishes, 



Q2Q3 



^(3)^(3) 
^12 '-"21 

^(3)^(23)^(3) 
l^ll i^ll LT22 



= 0. 



so that the surviving terms have at least three (off-diagonal) resolvent entries in the numerator. We may 
now continue in this manner; at each step the number of (off-diagonal) resolvent entries in the numerator 
increases by at least one. 

More formally, we obtain a sequence A2, A3, . . . , At, where A2 '■— Q2 — '^^^(^f ^ — and Ai is obtained by 

GiiG^^ G22 

applying (4.6) with fc = i to each entry of QiAi^i, and keeping only the nonvanishing terms. The following 



properties are easy to check by induction. 
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(i) A, = Q^A^^l. 



(ii) Ai consists of the projection Q2 ■ ■ ■ Qi applied to a sum effractions such that all entries in the numerator 
are diagonal and all entries in the denominator are diagonal. 



(iii) The number of (off-diagonal) entries in the numerator of each term of Ai is at least i. 

combined with (ii) and (iii) we conclude that \Ai\ -< ^P^. From (i) we therefore get 

1 



By Lemma 



B.l 



Qf-Q2 



Gil 



At - O^i^l). 



This is (B.26). Hence the proof is complete. 



□ 



C. Large deviation bounds 



We consider random variables X satisfying 

EX = 0, E\X\^ = 1 
for all p e N and some constants fip. 



(E\x\py/p ^ 



(C.l) 



Theorem C.l (Large deviation bounds). Let (X, 



d (Y^ ) be independent families of random 
variables and {o.\^^) and {b[^^) be deterministic; here N €z N and i,j — 1, ... ,7V. Suppose that all entries 



XI" > andY^' are independent and satisfy ( |C.1[ ). Then we have the bounds 

1/2 



i 

YaijX^Yj -< 



'^aijXiXj -< 



i 

E|a» 



1/2 



1/2 



(C.2) 
(C.3) 
(C.4) 



// the coefficients Oj-^'' and bf^^ depend on an additional parameter u, then all of these estimates are uniform 



in u (see Definition 2.1), i.e. the threshold Nq — No[e,D) in the definition of < depends only on the family 
IJp from (2.6) and S from (2.4); in particular, Nq does not depend on u. 



Proof. The estimates ( |C.2[ ), ( |C.3[ ), and ( |C.4[ ) follow from Lemmas B.2, B.3, and B.4 of ^, combined with 
Chebyshev's inequality. □ 
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